User Tools

Site Tools


cluster:208

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:208 [2021/10/21 14:08]
hmeij07 [Changes]
cluster:208 [2022/05/27 13:05]
hmeij07 [gpu testing]
Line 383: Line 383:
  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//
  
 +===== gpu testing =====
 +
 +  * test slurm v 21.08.1
 +  * n33-n37 each: 4 gpus, 16 cores, 16 threads, 32 cpus
 +  * submit one at a time, observe  
 +  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified, n33 only
 +  * "resources" reason at 17th submit, used up 16 cores and 16 threads
 +  * all on same gpu
 +  * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n33 only
 +  * "resources" reason at 17th submit too, same reason
 +  * all gpus used? nope, all on the same one 0
 +  * redoing above with a  ''export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`''
 +  * even distribution across all gpus, 17th submit reason too
 +  * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified, n[33-34] avail
 +  * while submitting 34 jobs, one at a time (30s delay), slurm fills up n33 first (all on gpu 0)
 +  * 17th submit goes to n34, gpu 1 (weird), n33 state=alloc, n34 state=mix
 ===== Changes ===== ===== Changes =====
  
cluster/208.txt ยท Last modified: 2022/11/02 17:28 by hmeij07