Differences

This shows you the differences between two versions of the page.

--- cluster:208 [2021/10/15 12:57]
hmeij07
+++ cluster:208 [2021/10/18 17:41]
hmeij07 [Changes]
@@ Line 149: / Line 149: @@
 Same on the cpu only compute nodes. Features could be created for memory footprints (for example "hasMem64", "hasMem128", hasMem192", "hasMem256", "hasMem32"). Then all the cpu only nodes can go into one queue and we can stick all cpu+gpu nodes in another queue. Or all of them in a single queue. We'll see, just testing.
-On the resource requests: You may request 1 or more nodes, 1 or more sockets per node, 1 or more cores (physical) per socket or 1 or more threads (logical + physical) per core. Such a request can be fine grained or not; just request a node with ''--exclusive'' (test queue only) or share nodes (other queues, wit ''--oversubscribe'')
+On the cpu resource requests: You may request 1 or more nodes, 1 or more sockets per node, 1 or more cores (physical) per socket or 1 or more threads (logical + physical) per core. Such a request can be fine grained or not; just request a node with ''--exclusive'' (test queue only) or share nodes (other queues, with ''--oversubscribe'')
-//Note: this oversubscribing is not working yet. I can only get 4 simultaneous jobs running. Maybe there is a conflict with Openlava jobs. Should isolate a node and do further testing. After isolation (n37), 4 jobs with -n 4 exhausts number of physical cores. Is that why 5th job goes pending?//
+//Note: this oversubscribing is not working yet. I can only get 4 simultaneous jobs running. Maybe there is a conflict with Openlava jobs. Should isolate a node and do further testing. After isolation (n37), 4 jobs with -n 4 exhausts number of physical cores. Is that why 5th job goes pending? Solved, see Changes section.//
 ===== MPI =====
@@ Line 157: / Line 157: @@
 Slurm has a builtin MPI flavor, I suggest you do not rely on it. The documentation states that on major release upgrades the ''libslurm.so'' library is not backwards compatible and all software using it would need to be recompiled.  There is a handy parallel job launcher which may be of use, it is called ''srun''.
-For now, we'll rely on PATH/LD_LIBRARY_PATH settings to the control environment. This also implies your job should run under Openlava or Slurm. With the new head node deployment we'll introduce ''modules'' to control the environment.
+For now, we'll rely on PATH/LD_LIBRARY_PATH settings to control the environment. This also implies your job should run under Openlava or Slurm. With the new head node deployment we'll introduce ''modules'' to control the environment for newly installed software.
 ''srun'' commands can be embedded in a job submission script but it can also run interactively. Like
@@ Line 163: / Line 163: @@
 <code>
-$ srun --partition=mwgpu -n 4 -B 1:1:1 --mem=1024 sleep 60 &
+$ srun --partition=mwgpu -n 4 -B 1:4:1 --mem=1024 sleep 60 &
 </code>
@@ Line 379: / Line 379: @@
 ===== Feedback =====
-If there are errors on this page, or mistatements, let me know. As we test and improve the setup to mimic a production environment I will update the page (and mark those entries with timestamp/signature).
+If there are errors on this page, or mistatements, let me know. As we test and improve the setup to mimic a production environment I will update the page (and mark those entries with newer timestamp/signature).
- --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/14 15:20//
+ --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//
+===== Changes =====
+** OverSubscribe **
+Suggestion was made to set ''OverSubcribe=No'' for all partitions (thanks, Colin). We now observe with a simple sleep script that we can run 16 jobs simultaneously (with either -n or -B). So that's 16 physical cores, each has a logical core (thread) for a total of 32 cpus for ''n37''.
+''for i in `seq 1 17`;do sbatch sleep; done''
+<code>
+#!/bin/bash
+#SBATCH --job-name=sleep
+#SBATCH --partition=mwgpu
+###SBATCH -n 1
+#SBATCH -B 1:1:1
+#SBATCH --mem=1024
+sleep 60
+</code>
+ --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 15:18//
+** GPU-CPU cores **
+Noticed this with debug level on in slurmd.log
+<code>
+# n37: old gpu model bound to all physical cpu cores
+GRES[gpu] Type:tesla_k20m Count:1 Cores(32):0-15  Links:-1,0,0,0 /dev/nvidia0
+GRES[gpu] Type:tesla_k20m Count:1 Cores(32):0-15  Links:0,-1,0,0 /dev/nvidia1
+GRES[gpu] Type:tesla_k20m Count:1 Cores(32):0-15  Links:0,0,-1,0 /dev/nvidia2
+GRES[gpu] Type:tesla_k20m Count:1 Cores(32):0-15  Links:0,0,0,-1 /dev/nvidia3
+# n78: somewhat dated gpu model, bound to top/bot of physical cores (16)
+GRES[gpu] Type:geforce_gtx_1080_ti Count:1 Cores(32):0-7   Links:-1,0,0,0 /dev/nvidia0
+GRES[gpu] Type:geforce_gtx_1080_ti Count:1 Cores(32):0-7   Links:0,-1,0,0 /dev/nvidia1
+GRES[gpu] Type:geforce_gtx_1080_ti Count:1 Cores(32):8-15  Links:0,0,-1,0 /dev/nvidia2
+GRES[gpu] Type:geforce_gtx_1080_ti Count:1 Cores(32):8-15  Links:0,0,0,-1 /dev/nvidia3
+# n79, more recent gpu model, same bound pattern of top/bot (24)
+GRES[gpu] Type:geforce_rtx_2080_s Count:1 Cores(48):0-11  Links:-1,0,0,0 /dev/nvidia0
+GRES[gpu] Type:geforce_rtx_2080_s Count:1 Cores(48):0-11  Links:0,-1,0,0 /dev/nvidia1
+GRES[gpu] Type:geforce_rtx_2080_s Count:1 Cores(48):12-23  Links:0,0,-1,0 /dev/nvidia2
+GRES[gpu] Type:geforce_rtx_2080_s Count:1 Cores(48):12-23  Links:0,0,0,-1 /dev/nvidia3
+</code>
+** Weight Priority **
+Weight nodes by the memory per logical core: jobs will be allocated the nodes with the lowest weight which satisfies their requirements. So CPU jobs will be routed last to gpu queues because they have the highest weight (=lowest priority).
+hp12: 12/8 = 1.5
+tinymem: 32/20 = 1.6
+mw128: 128/24 = 5.333333
+mw256: 256/16 = 16
+exx96: 96/24 = 4
+amber128: 128/16 = 8
+mwgpu = 256/16 = 16
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools