User Tools

Site Tools


cluster:208

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:208 [2021/10/21 14:03]
hmeij07 [Changes]
cluster:208 [2022/05/26 19:57]
hmeij07 [gpu testing]
Line 383: Line 383:
  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//  --- //[[hmeij@wesleyan.edu|Henk]] 2021/10/15 09:16//
  
 +===== gpu testing =====
 +
 +  * n33 only, 4 gpus, 16 cores, 16 threads, 32 cpus
 +  * submit one at a time, observe  
 +  * part=test, n 1, B 1:1:1, cuda_visible=0, no node specified
 +  * "resources" reason at 17th submit, used up 16 cores and 16 threads
 +  * all on same gpu
 +  * part=test, n 1, B 1:1:1, cuda_visible not set, no node specified
 +  * "resources" reason at 17th submit too, same reason
 +  * all gpus used? nope, all on the same one 0
 +  * redoing above with a  ''export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`''
 +  * even distribution across all gpus, 17th submit reason too
 +  * 
 ===== Changes ===== ===== Changes =====
  
Line 430: Line 443:
 </code> </code>
  
-** Weight Priority **+** Partition Priority **
  
-Weight nodes by the memory per logical core: jobs will be allocated the nodes with the lowest weight which satisfies their requirementsSo CPU jobs will be routed last to gpu queues because they have the highest weight (=lowest priority).+If set you can list more than one queue...
  
 +<code>
 + srun --partition=exx96,amber128,mwgpu  --mem=1024  --gpus=1  --gres=gpu:any sleep 60 &
 +</code>
 +
 +The above will fill up n79 first, then n78, then n36...
 +
 +** Node Weight Priority **
 +
 +Weight nodes by the memory per logical core: jobs will be allocated the nodes with the lowest weight which satisfies their requirements. So CPU jobs will be routed last to gpu queues because they have the highest weight (=lowest priority).
 <code> <code>
 hp12: 12/8 = 1.5 hp12: 12/8 = 1.5
cluster/208.txt ยท Last modified: 2022/11/02 17:28 by hmeij07