User Tools

Site Tools


cluster:207

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:207 [2021/10/13 17:06]
hmeij07
cluster:207 [2023/10/27 18:47] (current)
hmeij07
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
 +**Make sure munge/unmunge work between 1.3/2.4, that date is in sync (else you get error #16)**
  
 ===== Slurm Test Env ===== ===== Slurm Test Env =====
Line 7: Line 8:
 Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on ''greentail52'' with a some compute nodes. Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on ''greentail52'' with a some compute nodes.
  
-This page just intended to keep documentation sources handy.+This page just intended to keep documentation sources handy. Go to the **Users** page [[cluster:208|Slurm Test Env]]
  
-**SLURM documentation**+==== SLURM documentation ====
  
 <code> <code>
Line 34: Line 35:
 https://slurm.schedmd.com/slurm.conf.html https://slurm.schedmd.com/slurm.conf.html
 section: node configuration section: node configuration
 +
 +The node range expression can contain one pair of square brackets with a sequence of comma-separated numbers and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or "lx[15,18,32-33]")
 +
 Features (hasGPU, hasRTX5000) Features (hasGPU, hasRTX5000)
 are intended to be used to filter nodes eligible to run jobs via the --constraint argument. are intended to be used to filter nodes eligible to run jobs via the --constraint argument.
Line 50: Line 54:
 https://slurm.schedmd.com/gres.html#GPU_Management https://slurm.schedmd.com/gres.html#GPU_Management
 setting up gres.conf setting up gres.conf
 +
 +give GPU jobs priority using the Multifactor Priority plugin:
 +https://slurm.schedmd.com/priority_multifactor.html#tres
 +PriorityWeightTRES=GRES/gpu=1000
 +example here: https://slurm.schedmd.com/SLUG19/Priority_and_Fair_Trees.pdf
 +requires faishare thus the database
  
 https://slurm.schedmd.com/mc_support.html https://slurm.schedmd.com/mc_support.html
Line 107: Line 117:
  
  
-** MUNGE installation**+==== MUNGE installation ====
  
 <code> <code>
Line 173: Line 183:
 </code> </code>
  
-** SLURM installation **+==== SLURM installation Updated ==== 
 + 
 +<code> 
 + 
 +export PATH=/usr/local/cuda/bin:$PATH 
 +export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH 
 + 
 +[root@cottontail2 slurm-22.05.2]# which gcc mpicc nvcc 
 +/opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc 
 +/opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/bin/mpicc 
 +/usr/local/cuda/bin/nvcc 
 + 
 + 
 +./configure \ 
 +--prefix=/usr/local/slurm-22.05.2 \ 
 +--sysconfdir=/usr/local/slurm-22.05.2/etc \ 
 +--with-nvml=/usr/local/cuda 
 +make 
 +make install 
 + 
 +export PATH=/usr/local/slurm/bin:$PATH 
 +export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH 
 + 
 +[root@cottontail2 slurm-22.05.2]# find /usr/local/slurm-22.05.2/ -name auth_munge.so 
 +/usr/local/slurm-22.05.2/lib/slurm/auth_munge.so 
 + 
 +</code> 
 + 
 + 
 +==== SLURM installation ====
  
 Configured and compiled on ''greentail52'' despite not having gpus...only library manager is needed (nvml) Configured and compiled on ''greentail52'' despite not having gpus...only library manager is needed (nvml)
Line 222: Line 261:
  
  
-For **general accounting** we may rely on simple text file+==== General Accounting ====
  
 <code> <code>
Line 241: Line 280:
  
  
-**Full Version Slurm Config Tool**+==== Slurm Config Tool ====
  
   * lets start with this file and build up/out   * lets start with this file and build up/out
cluster/207.1634144780.txt.gz · Last modified: 2021/10/13 17:06 by hmeij07