This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:208 [2021/10/21 14:03] hmeij07 [Changes] |
cluster:208 [2022/05/27 13:05] hmeij07 [gpu testing] |
||
---|---|---|---|
Line 383: | Line 383: | ||
--- // | --- // | ||
+ | ===== gpu testing ===== | ||
+ | |||
+ | * test slurm v 21.08.1 | ||
+ | * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus | ||
+ | * submit one at a time, observe | ||
+ | * part=test, n 1, B 1:1:1, cuda_visible=0, | ||
+ | * " | ||
+ | * all on same gpu | ||
+ | * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n33 only | ||
+ | * " | ||
+ | * all gpus used? nope, all on the same one 0 | ||
+ | * redoing above with a '' | ||
+ | * even distribution across all gpus, 17th submit reason too | ||
+ | * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n[33-34] avail | ||
+ | * while submitting 34 jobs, one at a time (30s delay), slurm fills up n33 first (all on gpu 0) | ||
+ | * 17th submit goes to n34, gpu 1 (weird), n33 state=alloc, | ||
===== Changes ===== | ===== Changes ===== | ||
Line 430: | Line 446: | ||
</ | </ | ||
- | ** Weight | + | ** Partition |
- | Weight nodes by the memory per logical core: jobs will be allocated the nodes with the lowest weight which satisfies their requirements. So CPU jobs will be routed last to gpu queues because they have the highest weight (=lowest priority). | + | If set you can list more than one queue... |
+ | < | ||
+ | srun --partition=exx96, | ||
+ | </ | ||
+ | |||
+ | The above will fill up n79 first, then n78, then n36... | ||
+ | |||
+ | ** Node Weight Priority ** | ||
+ | |||
+ | Weight nodes by the memory per logical core: jobs will be allocated the nodes with the lowest weight which satisfies their requirements. So CPU jobs will be routed last to gpu queues because they have the highest weight (=lowest priority). | ||
< | < | ||
hp12: 12/8 = 1.5 | hp12: 12/8 = 1.5 |