Differences

This shows you the differences between two versions of the page.

--- cluster:208 [2021/10/18 18:55]
hmeij07 [Changes]
+++ cluster:208 [2022/05/26 17:23]
hmeij07 [Feedback]
@@ Line 383: / Line 383: @@
  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//
+===== gpu testing =====
+  * n33 only, free of jobs, 4 gpus, 16 cores, 16 threads, 32 cpus
+  * submit one at a time, observe where pmemd.cuda ends up
+  *
 ===== Changes =====
@@ Line 430: / Line 436: @@
 </code>
-** Weight Priority **
+** Partition Priority **
-Weight nodes by the memory per logical core: jobs will be allocated the nodes with the lowest weight which satisfies their requirements. So CPU jobs will be routed last to gpu queues because they have the highest weight (=lowest priority).
+If set you can list more than one queue...
+<code>
+ srun --partition=exx96,amber128,mwgpu  --mem=1024  --gpus=1  --gres=gpu:any sleep 60 &
+</code>
+The above will fill up n79 first, then n78, then n36...
+** Node Weight Priority **
+Weight nodes by the memory per logical core: jobs will be allocated the nodes with the lowest weight which satisfies their requirements. So CPU jobs will be routed last to gpu queues because they have the highest weight (=lowest priority).
 <code>
 hp12: 12/8 = 1.5
@@ Line 460: / Line 475: @@
 Makes for a better 1-1 relationship of physical core to ''ntask'' yet the "hyperthreads" are still available to user jobs but physical cores are consumed first, if I got all this right.
-Deployed. My need to set threads=1 and cpus=(quantity of physical cores)
+Deployed. My need to set threads=1 and cpus=(quantity of physical cores)...this went horribly wrong it resaulted in sockets=1 setting and threads=1 for each node.
  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/18 14:32//
+We did set number of cpus per gpu (12 for n79) and minimum memory settings. Now we experience 5th job pending with 48 cpus consumed. When using sbatch set -n 8 because sbatch will override defaults.
+<code>
+ srun --partition=test  --mem=1024  --gres=gpu:geforce_rtx_2080_s:1 sleep 60 &
+</code>
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools