Warning: Undefined array key "DOKU_PREFS" in /usr/share/dokuwiki/inc/common.php on line 2082
cluster:208 [DokuWiki]

User Tools

Site Tools


cluster:208

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
cluster:208 [2022/05/31 08:34]
hmeij07 [gpu testing]
cluster:208 [2022/06/03 08:30]
hmeij07 [gpu testing]
Line 385: Line 385:
 ===== gpu testing ===== ===== gpu testing =====
  
-  * test slurm v 21.08.1+  * test standalone slurm v 21.08.1
   * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus   * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus
   * submit one at a time, observe     * submit one at a time, observe  
Line 405: Line 405:
   * do all 16 jobs log the same wall time? Yes, between 10.10 and 10.70 hours.   * do all 16 jobs log the same wall time? Yes, between 10.10 and 10.70 hours.
  
-  * ohpc slurm v 20.11.8+  * ohpc v2.4 slurm v 20.11.8 
   * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n100 only   * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n100 only
-  * +  * hit a bug, you must specify cpus-per-gpu **and** mem-per-gpu 
 +  * then slurm detects 4 gpus on allocated node and allows 4 jobs on a single allocated gpu 
 +  * twisted logic 
 +  * so recent openhpc version but old slurm version in software stack 
 +  * trying standalone install on openhpc prod cluster - auth/munge error, no go 
 +  * do all 4 jobs have similar wall time? Yes on n100 varies from 0.6 to 0.7 hours 
 + 
 +  * ohpc v2.4 slurm v 20.11.8  
 +  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n78 only 
 +  * same as above but all 16 jobs run on gpu 0 
 +  * so the limit to 4 jobs on rtx5000 gpu is a hardware phenomenon? 
 +  * all 16 jobs finished, waal times of 3.11 to 3.60 hours 
 + 
 + 
 + 
 + 
 ===== Changes ===== ===== Changes =====
  
cluster/208.txt · Last modified: 2022/11/02 13:28 by hmeij07