User Tools

Site Tools


cluster:207

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:207 [2021/10/12 14:07]
hmeij07
cluster:207 [2023/10/27 18:47] (current)
hmeij07
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
 +**Make sure munge/unmunge work between 1.3/2.4, that date is in sync (else you get error #16)**
  
 ===== Slurm Test Env ===== ===== Slurm Test Env =====
Line 7: Line 8:
 Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on ''greentail52'' with a some compute nodes. Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on ''greentail52'' with a some compute nodes.
  
 +This page just intended to keep documentation sources handy. Go to the **Users** page [[cluster:208|Slurm Test Env]]
  
-**SLURM documentation**+==== SLURM documentation ====
  
 <code> <code>
Line 33: Line 35:
 https://slurm.schedmd.com/slurm.conf.html https://slurm.schedmd.com/slurm.conf.html
 section: node configuration section: node configuration
 +
 +The node range expression can contain one pair of square brackets with a sequence of comma-separated numbers and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or "lx[15,18,32-33]")
 +
 Features (hasGPU, hasRTX5000) Features (hasGPU, hasRTX5000)
 are intended to be used to filter nodes eligible to run jobs via the --constraint argument. are intended to be used to filter nodes eligible to run jobs via the --constraint argument.
Line 49: Line 54:
 https://slurm.schedmd.com/gres.html#GPU_Management https://slurm.schedmd.com/gres.html#GPU_Management
 setting up gres.conf setting up gres.conf
 +
 +give GPU jobs priority using the Multifactor Priority plugin:
 +https://slurm.schedmd.com/priority_multifactor.html#tres
 +PriorityWeightTRES=GRES/gpu=1000
 +example here: https://slurm.schedmd.com/SLUG19/Priority_and_Fair_Trees.pdf
 +requires faishare thus the database
  
 https://slurm.schedmd.com/mc_support.html https://slurm.schedmd.com/mc_support.html
Line 106: Line 117:
  
  
-** MUNGE installation**+==== MUNGE installation ====
  
 <code> <code>
Line 172: Line 183:
 </code> </code>
  
-** SLURM installation **+==== SLURM installation Updated ====
  
 <code> <code>
  
-#source /share/apps/CENTOS7/amber/miniconda3/etc/profile.d/conda.sh +export PATH=/usr/local/cuda/bin:$PATH 
-#export PATH=/share/apps/CENTOS7/amber/miniconda3/bin:$PATH +export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
-#export LD_LIBRARY_PATH=/share/apps/CENTOS7/amber/miniconda3/lib:$LD_LIBRARY_PATH +
-#which mpirun python conda+
  
-cuda 9.... configure finds /usr/local/cuda which points to n37-cuda-9.2 +[root@cottontail2 slurm-22.05.2]which gcc mpicc nvcc 
-#export CUDAHOME=/usr/local/n37-cuda-9.2 +/opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc 
-#export PATH=/usr/local/n37-cuda-9.2/bin:$PATH +/opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/bin/mpicc 
-#export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH +/usr/local/cuda/bin/nvcc 
-#which nvcc+ 
 + 
 +./configure \ 
 +--prefix=/usr/local/slurm-22.05.2 \ 
 +--sysconfdir=/usr/local/slurm-22.05.2/etc \ 
 +--with-nvml=/usr/local/cuda 
 +make 
 +make install 
 + 
 +export PATH=/usr/local/slurm/bin:$PATH 
 +export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH 
 + 
 +[root@cottontail2 slurm-22.05.2]# find /usr/local/slurm-22.05.2/ -name auth_munge.so 
 +/usr/local/slurm-22.05.2/lib/slurm/auth_munge.so 
 + 
 +</code> 
 + 
 + 
 +==== SLURM installation ==== 
 + 
 +Configured and compiled on ''greentail52'' despite not having gpus...only library manager is needed (nvml) 
 + 
 +<code> 
 + 
 +cuda 9.2 ...  
 +# installer found /usr/local/cuda on ''greentail''
  
 # just in case # just in case
Line 192: Line 226:
 which mpirun which mpirun
  
-# /usr/local/slurm is link to slurm-21.08.1+# /usr/local/slurm is symbolic link to slurm-21.08.1
 ./configure \ ./configure \
---prefix=/usr/local/slurm \ +--prefix=/usr/local/slurm-21.08.1 
---sysconfdir=/usr/local/slurm/etc \ +--sysconfdir=/usr/local/slurm-21.08.1/etc \ 
-tee -a install.log + tee -a install.log 
-not --with-nvml=/usr/local/n37-cuda-9.2 \ +skip # --with-nvml=/usr/local/n37-cuda-9.2 \ 
-not -with-hdf5=no \ +skip # -with-hdf5=no  \
- +
 # known hdf5 library problem when including --with-nvml # known hdf5 library problem when including --with-nvml
  
Line 206: Line 238:
 config.status: creating src/plugins/gpu/nvml/Makefile config.status: creating src/plugins/gpu/nvml/Makefile
  
 +====
 +Libraries have been installed in:
 +   /usr/local/slurm-21.08.1/lib/slurm
  
 +If you ever happen to want to link against installed libraries
 +in a given directory, LIBDIR, you must either use libtool, and
 +specify the full pathname of the library, or use the '-LLIBDIR'
 +flag during linking and do at least one of the following:
 +   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
 +     during execution
 +   - add LIBDIR to the 'LD_RUN_PATH' environment variable
 +     during linking
 +   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
 +   - have your system administrator add LIBDIR to '/etc/ld.so.conf'
 +====
 +
 +# for now
 export PATH=/usr/local/slurm/bin:$PATH export PATH=/usr/local/slurm/bin:$PATH
 export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH
  
-From job completions file, JOB #3+</code> 
 + 
 + 
 +==== General Accounting ==== 
 + 
 +<code> 
 + 
 +From job completions file, JOB #3, convert Start and End times to epoch seconds
  
 StartTime=2021-10-06T14:32:37 EndTime=2021-10-06T14:37:40 StartTime=2021-10-06T14:32:37 EndTime=2021-10-06T14:37:40
Line 225: Line 280:
  
  
-**Full Version Slurm Config Tool**+==== Slurm Config Tool ==== 
 + 
 +  lets start with this file and build up/out
  
 <code> <code>
Line 375: Line 432:
 # #
 # COMPUTE NODES # COMPUTE NODES
-NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=12 State=UNKNOWN+NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=State=UNKNOWN
 # #
 # #
cluster/207.1634047666.txt.gz · Last modified: 2021/10/12 14:07 by hmeij07