Differences

This shows you the differences between two versions of the page.

--- cluster:208 [2021/10/15 19:16]
hmeij07 [Feedback]
+++ cluster:208 [2022/05/27 13:03]
hmeij07 [gpu testing]
@@ Line 151: / Line 151: @@
 On the cpu resource requests: You may request 1 or more nodes, 1 or more sockets per node, 1 or more cores (physical) per socket or 1 or more threads (logical + physical) per core. Such a request can be fine grained or not; just request a node with ''--exclusive'' (test queue only) or share nodes (other queues, with ''--oversubscribe'')
-//Note: this oversubscribing is not working yet. I can only get 4 simultaneous jobs running. Maybe there is a conflict with Openlava jobs. Should isolate a node and do further testing. After isolation (n37), 4 jobs with -n 4 exhausts number of physical cores. Is that why 5th job goes pending?//
+//Note: this oversubscribing is not working yet. I can only get 4 simultaneous jobs running. Maybe there is a conflict with Openlava jobs. Should isolate a node and do further testing. After isolation (n37), 4 jobs with -n 4 exhausts number of physical cores. Is that why 5th job goes pending? Solved, see Changes section.//
 ===== MPI =====
@@ Line 383: / Line 383: @@
  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//
+===== gpu testing =====
+  * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus
+  * submit one at a time, observe
+  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n33 only
+  * "resources" reason at 17th submit, used up 16 cores and 16 threads
+  * all on same gpu
+  * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n33 only
+  * "resources" reason at 17th submit too, same reason
+  * all gpus used? nope, all on the same one 0
+  * redoing above with a  ''export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`''
+  * even distribution across all gpus, 17th submit reason too
+  * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n[33-34] avail
+  * while submitting 34 jobs, one at a time (30s delay), slurm fills up n33 first (all on gpu 0)
+  * 17th submit goes to n34, gpu 1 (weird)
 ===== Changes =====
-Suggestion was made to set Oversubcribe=No for all partitions (thanks, Colin). We now observe with a simple sleep script that we can run 16 jobs simultaneously (with either -n or -B). So that's 16 physical cores, each has a logical core for a total of 32.
+** OverSubscribe **
+Suggestion was made to set ''OverSubcribe=No'' for all partitions (thanks, Colin). We now observe with a simple sleep script that we can run 16 jobs simultaneously (with either -n or -B). So that's 16 physical cores, each has a logical core (thread) for a total of 32 cpus for ''n37''.
 ''for i in `seq 1 17`;do sbatch sleep; done''
@@ Line 397: / Line 415: @@
 #SBATCH --mem=1024
 sleep 60
+</code>
+ --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 15:18//
+** GPU-CPU cores **
+Noticed this with debug level on in slurmd.log. No action taken.
+<code>
+# n37: old gpu model bound to all physical cpu cores
+GRES[gpu] Type:tesla_k20m Count:1 Cores(32):0-15  Links:-1,0,0,0 /dev/nvidia0
+GRES[gpu] Type:tesla_k20m Count:1 Cores(32):0-15  Links:0,-1,0,0 /dev/nvidia1
+GRES[gpu] Type:tesla_k20m Count:1 Cores(32):0-15  Links:0,0,-1,0 /dev/nvidia2
+GRES[gpu] Type:tesla_k20m Count:1 Cores(32):0-15  Links:0,0,0,-1 /dev/nvidia3
+# n78: somewhat dated gpu model, bound to top/bot of physical cores (16)
+GRES[gpu] Type:geforce_gtx_1080_ti Count:1 Cores(32):0-7   Links:-1,0,0,0 /dev/nvidia0
+GRES[gpu] Type:geforce_gtx_1080_ti Count:1 Cores(32):0-7   Links:0,-1,0,0 /dev/nvidia1
+GRES[gpu] Type:geforce_gtx_1080_ti Count:1 Cores(32):8-15  Links:0,0,-1,0 /dev/nvidia2
+GRES[gpu] Type:geforce_gtx_1080_ti Count:1 Cores(32):8-15  Links:0,0,0,-1 /dev/nvidia3
+# n79, more recent gpu model, same bound pattern of top/bot (24)
+GRES[gpu] Type:geforce_rtx_2080_s Count:1 Cores(48):0-11  Links:-1,0,0,0 /dev/nvidia0
+GRES[gpu] Type:geforce_rtx_2080_s Count:1 Cores(48):0-11  Links:0,-1,0,0 /dev/nvidia1
+GRES[gpu] Type:geforce_rtx_2080_s Count:1 Cores(48):12-23  Links:0,0,-1,0 /dev/nvidia2
+GRES[gpu] Type:geforce_rtx_2080_s Count:1 Cores(48):12-23  Links:0,0,0,-1 /dev/nvidia3
+</code>
+** Partition Priority **
+If set you can list more than one queue...
+<code>
+ srun --partition=exx96,amber128,mwgpu  --mem=1024  --gpus=1  --gres=gpu:any sleep 60 &
+</code>
+The above will fill up n79 first, then n78, then n36...
+** Node Weight Priority **
+Weight nodes by the memory per logical core: jobs will be allocated the nodes with the lowest weight which satisfies their requirements. So CPU jobs will be routed last to gpu queues because they have the highest weight (=lowest priority).
+<code>
+hp12: 12/8 = 1.5
+tinymem: 32/20 = 1.6
+mw128: 128/24 = 5.333333
+mw256: 256/16 = 16
+exx96: 96/24 = 4
+amber128: 128/16 = 8
+mwgpu = 256/16 = 16
+</code>
+Or more arbitrary (based on desired cpu node comsumption of cpu jobs. No action taken.
+<code>
+tinymem   10
+mw128     20
+mw256fd  30   +  HasMem256 feature so cpu jobs can directly target large mem
+mwgpu    40    +  HasMem256 feature
+amber128  50
+exx96      80
+</code>
+** CR_CPU_Memory **
+Makes for a better 1-1 relationship of physical core to ''ntask'' yet the "hyperthreads" are still available to user jobs but physical cores are consumed first, if I got all this right.
+Deployed. My need to set threads=1 and cpus=(quantity of physical cores)...this went horribly wrong it resaulted in sockets=1 setting and threads=1 for each node.
+ --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/18 14:32//
+We did set number of cpus per gpu (12 for n79) and minimum memory settings. Now we experience 5th job pending with 48 cpus consumed. When using sbatch set -n 8 because sbatch will override defaults.
+<code>
+ srun --partition=test  --mem=1024  --gres=gpu:geforce_rtx_2080_s:1 sleep 60 &
 </code>

DokuWiki

User Tools

Site Tools

Differences

Page Tools