User Tools

Site Tools


cluster:208

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:208 [2022/05/27 09:03]
hmeij07 [gpu testing]
cluster:208 [2022/11/02 13:28] (current)
hmeij07 [gpu testing]
Line 385: Line 385:
 ===== gpu testing ===== ===== gpu testing =====
  
 +  * test standalone slurm v 21.08.1
   * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus   * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus
   * submit one at a time, observe     * submit one at a time, observe  
Line 397: Line 398:
   * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n[33-34] avail   * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n[33-34] avail
   * while submitting 34 jobs, one at a time (30s delay), slurm fills up n33 first (all on gpu 0)   * while submitting 34 jobs, one at a time (30s delay), slurm fills up n33 first (all on gpu 0)
-  * 17th submit goes to n34, gpu 1 (weird)+  * 17th submit goes to n34, gpu 1 (weird), n33 state=alloc, n34 state=mix 
 +  * 33th job, "Resources" reason, job pending 
 +  * 34th job, "Priority" reason (?), job pending 
 +  * all n33 and n34 jobs on single gpu without cuda_visible set 
 +  * how that works with gpu util at 100% with one jobs is beyond me 
 +  * do all 16 jobs log the same wall time? Yes, between 10.10 and 10.70 hours. 
 + 
 +  * ohpc v2.4 slurm v 20.11.8  
 +  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n100 only 
 +  * hit a bug, you must specify cpus-per-gpu **and** mem-per-gpu 
 +  * then slurm detects 4 gpus on allocated node and allows 4 jobs on a single allocated gpu 
 +  * twisted logic 
 +  * so recent openhpc version but old slurm version in software stack 
 +  * trying standalone install on openhpc prod cluster - auth/munge error, no go 
 +  * do all 4 jobs have similar wall time? Yes on n100 varies from 0.6 to 0.7 hours 
 + 
 +  * ohpc v2.4 slurm v 20.11.8  
 +  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n78 only 
 +  * same as above but all 16 jobs run on gpu 0 
 +  * so the limit to 4 jobs on rtx5000 gpu is a hardware phenomenon? 
 +  * all 16 jobs finished, waal times of 3.11 to 3.60 hours 
 + 
 +===== gpu testing 2 ===== 
 + 
 +Newer 2022 version seems to have reversed the override options for oversubscribe. So here is our testing...back to CR_CPU_Memory and OverSubscribe=No   --- //[[hmeij@wesleyan.edu|Henk]] 2022/11/02 13:23// 
 + 
 +<code> 
 + 
 +CR_Socket_Memory 
 +PartitionName=test Nodes=n[100-101]  
 +Default=YES MaxTime=INFINITE State=UP  
 +OverSubscribe=No DefCpuPerGPU=12 
 + 
 +MPI jobs with -N 1, -n 8 and -B 2:4:1 
 +no override options, cpus=48 
 +--mem=2048, cpus=48 
 +and --cpus-per-task=1, cpus=48 
 +and  --ntasks-per-node=8, cpus=24 
 + 
 +MPI jobs with -N, -n 8 and -B 1:8:1 
 +--mem=10240 cpus=48 
 +and --cpus-per-task=1, cpus=48 
 +and  --ntasks-per-node=8, cpus=24 
 + 
 +GPU jobs with -N 1, -n 1 and -B 1:1:1  
 +no override options, no cuda export, cpus=48 
 +--cpus-per-gpu=1, cpus=24 
 +and --mem-per-gpu=7168, cpus=1 (pending 
 +while other gpu runs in queue but gpus are free???) 
 + 
 +GPU jobs with -N 1, -n 1 and -B 1:1:1  
 +no override options, yes cuda export, cpus=48 
 +--cpus-per-gpu=1, cpus=24 
 +and --mem-per-gpu=7168, cpus=1 (resources pending 
 +while a gpu job runs, gpus are free, then it executes) 
 + 
 +...suddenly the cpus=1 turns into cpus=24 
 +when submitting, slurm confused becuase of all 
 +the job cancellations? 
 + 
 +CR_CPU_Memory test=no, mwgpu=force:16 
 +PartitionName=test Nodes=n[100-101]  
 +Default=YES MaxTime=INFINITE State=UP  
 +OverSubscribe=No DefCpuPerGPU=12 
 + 
 +MPI jobs with -N 1, -n 8 and -B 2:4:1 
 +no override options, cpus=8 (queue fills across nodes, 
 +but only one job per node, test & mwgpu) 
 +--mem=1024, cpus=8 (queue fills first node ..., 
 +but only three jobs per node, test 3x8=24 full 4th job pending &  
 +mwgpu 17th job goes pending on n33, overloaded with -n 8 !!) 
 +(not needed) --cpus-per-task=?, cpus= 
 +(not needed)  --ntasks-per-node=?, cpus= 
 + 
 + 
 +GPU jobs with -N 1, -n 1 and -B 1:1:1 on test 
 +no override options, no cuda export, cpus=12 (one gpu per node) 
 +--cpus-per-gpu=1, cpus=1 (one gpu per node) 
 +and --mem-per-gpu=7168, cpus=1 (both override options 
 +required else all mem allocated!, max 4 jobs per node, 
 +fills first node first...cuda export not needed) 
 +with cuda export, same node, same gpu, 
 +with "no" enabled multiple jobs per gpu not accepted 
 + 
 + 
 +GPU jobs with -N 1, -n 1 and -B 1:1:1 on mwgpu 
 +--cpus-per-gpu=1, 
 +and --mem-per-gpu=7168, cpus=1  
 +(same node, same gpu, cuda export set,  
 +with "force:16" enabled 4 jobs per gpu accepted, 
 +potential for overloading!) 
 + 
 +</code> 
 + 
 + 
 + 
 + 
 ===== Changes ===== ===== Changes =====
  
cluster/208.1653656604.txt.gz · Last modified: 2022/05/27 09:03 by hmeij07