User Tools

Site Tools


cluster:208

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:208 [2021/10/21 10:08]
hmeij07 [Changes]
cluster:208 [2022/11/02 13:28] (current)
hmeij07 [gpu testing]
Line 382: Line 382:
  
  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//
 +
 +===== gpu testing =====
 +
 +  * test standalone slurm v 21.08.1
 +  * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus
 +  * submit one at a time, observe  
 +  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n33 only
 +  * "resources" reason at 17th submit, used up 16 cores and 16 threads
 +  * all on same gpu
 +  * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n33 only
 +  * "resources" reason at 17th submit too, same reason
 +  * all gpus used? nope, all on the same one 0
 +  * redoing above with a  ''export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`''
 +  * even distribution across all gpus, 17th submit reason too
 +  * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n[33-34] avail
 +  * while submitting 34 jobs, one at a time (30s delay), slurm fills up n33 first (all on gpu 0)
 +  * 17th submit goes to n34, gpu 1 (weird), n33 state=alloc, n34 state=mix
 +  * 33th job, "Resources" reason, job pending
 +  * 34th job, "Priority" reason (?), job pending
 +  * all n33 and n34 jobs on single gpu without cuda_visible set
 +  * how that works with gpu util at 100% with one jobs is beyond me
 +  * do all 16 jobs log the same wall time? Yes, between 10.10 and 10.70 hours.
 +
 +  * ohpc v2.4 slurm v 20.11.8 
 +  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n100 only
 +  * hit a bug, you must specify cpus-per-gpu **and** mem-per-gpu
 +  * then slurm detects 4 gpus on allocated node and allows 4 jobs on a single allocated gpu
 +  * twisted logic
 +  * so recent openhpc version but old slurm version in software stack
 +  * trying standalone install on openhpc prod cluster - auth/munge error, no go
 +  * do all 4 jobs have similar wall time? Yes on n100 varies from 0.6 to 0.7 hours
 +
 +  * ohpc v2.4 slurm v 20.11.8 
 +  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n78 only
 +  * same as above but all 16 jobs run on gpu 0
 +  * so the limit to 4 jobs on rtx5000 gpu is a hardware phenomenon?
 +  * all 16 jobs finished, waal times of 3.11 to 3.60 hours
 +
 +===== gpu testing 2 =====
 +
 +Newer 2022 version seems to have reversed the override options for oversubscribe. So here is our testing...back to CR_CPU_Memory and OverSubscribe=No   --- //[[hmeij@wesleyan.edu|Henk]] 2022/11/02 13:23//
 +
 +<code>
 +
 +CR_Socket_Memory
 +PartitionName=test Nodes=n[100-101] 
 +Default=YES MaxTime=INFINITE State=UP 
 +OverSubscribe=No DefCpuPerGPU=12
 +
 +MPI jobs with -N 1, -n 8 and -B 2:4:1
 +no override options, cpus=48
 +--mem=2048, cpus=48
 +and --cpus-per-task=1, cpus=48
 +and  --ntasks-per-node=8, cpus=24
 +
 +MPI jobs with -N, -n 8 and -B 1:8:1
 +--mem=10240 cpus=48
 +and --cpus-per-task=1, cpus=48
 +and  --ntasks-per-node=8, cpus=24
 +
 +GPU jobs with -N 1, -n 1 and -B 1:1:1 
 +no override options, no cuda export, cpus=48
 +--cpus-per-gpu=1, cpus=24
 +and --mem-per-gpu=7168, cpus=1 (pending
 +while other gpu runs in queue but gpus are free???)
 +
 +GPU jobs with -N 1, -n 1 and -B 1:1:1 
 +no override options, yes cuda export, cpus=48
 +--cpus-per-gpu=1, cpus=24
 +and --mem-per-gpu=7168, cpus=1 (resources pending
 +while a gpu job runs, gpus are free, then it executes)
 +
 +...suddenly the cpus=1 turns into cpus=24
 +when submitting, slurm confused becuase of all
 +the job cancellations?
 +
 +CR_CPU_Memory test=no, mwgpu=force:16
 +PartitionName=test Nodes=n[100-101] 
 +Default=YES MaxTime=INFINITE State=UP 
 +OverSubscribe=No DefCpuPerGPU=12
 +
 +MPI jobs with -N 1, -n 8 and -B 2:4:1
 +no override options, cpus=8 (queue fills across nodes,
 +but only one job per node, test & mwgpu)
 +--mem=1024, cpus=8 (queue fills first node ...,
 +but only three jobs per node, test 3x8=24 full 4th job pending & 
 +mwgpu 17th job goes pending on n33, overloaded with -n 8 !!)
 +(not needed) --cpus-per-task=?, cpus=
 +(not needed)  --ntasks-per-node=?, cpus=
 +
 +
 +GPU jobs with -N 1, -n 1 and -B 1:1:1 on test
 +no override options, no cuda export, cpus=12 (one gpu per node)
 +--cpus-per-gpu=1, cpus=1 (one gpu per node)
 +and --mem-per-gpu=7168, cpus=1 (both override options
 +required else all mem allocated!, max 4 jobs per node,
 +fills first node first...cuda export not needed)
 +with cuda export, same node, same gpu,
 +with "no" enabled multiple jobs per gpu not accepted
 +
 +
 +GPU jobs with -N 1, -n 1 and -B 1:1:1 on mwgpu
 +--cpus-per-gpu=1,
 +and --mem-per-gpu=7168, cpus=1 
 +(same node, same gpu, cuda export set, 
 +with "force:16" enabled 4 jobs per gpu accepted,
 +potential for overloading!)
 +
 +</code>
 +
 +
 +
 +
  
 ===== Changes ===== ===== Changes =====
cluster/208.1634825328.txt.gz ยท Last modified: 2021/10/21 10:08 by hmeij07