Differences

This shows you the differences between two versions of the page.

--- cluster:208 [2022/05/31 12:25]
hmeij07 [gpu testing]
+++ cluster:208 [2022/06/01 12:51]
hmeij07 [gpu testing]
@@ Line 385: / Line 385: @@
 ===== gpu testing =====
-  * test slurm v 21.08.1
+  * test standalone slurm v 21.08.1
   * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus
   * submit one at a time, observe
@@ Line 405: / Line 405: @@
   * do all 16 jobs log the same wall time? Yes, between 10.10 and 10.70 hours.
-  * ohpc slurm v
+  * ohpc v2.4 slurm v 20.11.8
+  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n100 only
+  * hit a bug, you must specify cpus-per-gpu **and** mem-per-gpu
+  * then slurm detects 4 gpus on allocated node and allows 4 jobs on a single allocated gpu
+  * twisted logic
+  * so recent openhpc version but old slurm version in software stack
+  * trying standalone install on openhpc prod cluster
+  * do all 4 jobs have similar wall time? Yes on n100 varies from 0.6 to 0.7 hours
 ===== Changes =====

DokuWiki