This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:208 [2021/10/15 13:02] hmeij07 [Overview] |
cluster:208 [2021/10/16 20:22] hmeij07 [Changes] |
||
---|---|---|---|
Line 151: | Line 151: | ||
On the cpu resource requests: You may request 1 or more nodes, 1 or more sockets per node, 1 or more cores (physical) per socket or 1 or more threads (logical + physical) per core. Such a request can be fine grained or not; just request a node with '' | On the cpu resource requests: You may request 1 or more nodes, 1 or more sockets per node, 1 or more cores (physical) per socket or 1 or more threads (logical + physical) per core. Such a request can be fine grained or not; just request a node with '' | ||
- | //Note: this oversubscribing is not working yet. I can only get 4 simultaneous jobs running. Maybe there is a conflict with Openlava jobs. Should isolate a node and do further testing. After isolation (n37), 4 jobs with -n 4 exhausts number of physical cores. Is that why 5th job goes pending?// | + | //Note: this oversubscribing is not working yet. I can only get 4 simultaneous jobs running. Maybe there is a conflict with Openlava jobs. Should isolate a node and do further testing. After isolation (n37), 4 jobs with -n 4 exhausts number of physical cores. Is that why 5th job goes pending? |
===== MPI ===== | ===== MPI ===== | ||
Line 157: | Line 157: | ||
Slurm has a builtin MPI flavor, I suggest you do not rely on it. The documentation states that on major release upgrades the '' | Slurm has a builtin MPI flavor, I suggest you do not rely on it. The documentation states that on major release upgrades the '' | ||
- | For now, we'll rely on PATH/ | + | For now, we'll rely on PATH/ |
'' | '' | ||
Line 163: | Line 163: | ||
< | < | ||
- | $ srun --partition=mwgpu -n 4 -B 1:1:1 --mem=1024 sleep 60 & | + | $ srun --partition=mwgpu -n 4 -B 1:4:1 --mem=1024 sleep 60 & |
</ | </ | ||
Line 379: | Line 379: | ||
===== Feedback ===== | ===== Feedback ===== | ||
- | If there are errors on this page, or mistatements, | + | If there are errors on this page, or mistatements, |
- | --- // | + | --- // |
+ | ===== Changes ===== | ||
+ | |||
+ | ** OverSubscribe ** | ||
+ | |||
+ | Suggestion was made to set '' | ||
+ | |||
+ | '' | ||
+ | |||
+ | < | ||
+ | #!/bin/bash | ||
+ | #SBATCH --job-name=sleep | ||
+ | #SBATCH --partition=mwgpu | ||
+ | ###SBATCH -n 1 | ||
+ | #SBATCH -B 1:1:1 | ||
+ | #SBATCH --mem=1024 | ||
+ | sleep 60 | ||
+ | </ | ||
+ | |||
+ | --- // | ||
+ | |||
+ | ** GPU-CPU cores ** | ||
+ | |||
+ | Noticed this with debug level on in slurmd.log | ||
+ | |||
+ | < | ||
+ | |||
+ | # n37: old gpu model bound to all physical cpu cores | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | |||
+ | # n78: somewhat dated gpu model, bound to top/bot of physical cores (16) | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | |||
+ | # n79, more recent gpu model, same bound pattern of top/bot (24) | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | GRES[gpu] Type: | ||
+ | |||
+ | </ | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: | ||