Warning: Undefined array key "DOKU_PREFS" in /usr/share/dokuwiki/inc/common.php on line 2082
cluster:207 [DokuWiki]

User Tools

Site Tools


cluster:207

Warning: Undefined array key 15 in /usr/share/dokuwiki/inc/html.php on line 1453

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
cluster:207 [2021/10/11 13:10]
hmeij07 created
cluster:207 [2023/10/27 14:47]
hmeij07
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-===== Slurm Test ===== +**Make sure munge/unmunge work between 1.3/2.4, that date is in sync (else you get error #16)**
- +
- +
-This is the technical page to hold content I might want to revisit.+
  
 ===== Slurm Test Env ===== ===== Slurm Test Env =====
  
-Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on ''greentail52'' with a single compute node ''n78'' which has four GTX1080Ti gpus and dual Intel(R) Xeon(R) CPU E5-2620 cpus.+Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on ''greentail52'' with a some compute nodes.
  
 +This page just intended to keep documentation sources handy. Go to the **Users** page [[cluster:208|Slurm Test Env]]
  
-**SLURM documentation**+==== SLURM documentation ====
  
 <code> <code>
Line 37: Line 35:
 https://slurm.schedmd.com/slurm.conf.html https://slurm.schedmd.com/slurm.conf.html
 section: node configuration section: node configuration
 +
 +The node range expression can contain one pair of square brackets with a sequence of comma-separated numbers and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or "lx[15,18,32-33]")
 +
 Features (hasGPU, hasRTX5000) Features (hasGPU, hasRTX5000)
 are intended to be used to filter nodes eligible to run jobs via the --constraint argument. are intended to be used to filter nodes eligible to run jobs via the --constraint argument.
Line 53: Line 54:
 https://slurm.schedmd.com/gres.html#GPU_Management https://slurm.schedmd.com/gres.html#GPU_Management
 setting up gres.conf setting up gres.conf
 +
 +give GPU jobs priority using the Multifactor Priority plugin:
 +https://slurm.schedmd.com/priority_multifactor.html#tres
 +PriorityWeightTRES=GRES/gpu=1000
 +example here: https://slurm.schedmd.com/SLUG19/Priority_and_Fair_Trees.pdf
 +requires faishare thus the database
  
 https://slurm.schedmd.com/mc_support.html https://slurm.schedmd.com/mc_support.html
Line 110: Line 117:
  
  
-** MUNGE installation**+==== MUNGE installation ====
  
 <code> <code>
Line 176: Line 183:
 </code> </code>
  
-** SLURM installation **+==== SLURM installation Updated ====
  
 <code> <code>
  
-source /share/apps/CENTOS7/amber/miniconda3/etc/profile.d/conda.sh +export PATH=/usr/local/cuda/bin:$PATH 
-export PATH=/share/apps/CENTOS7/amber/miniconda3/bin:$PATH +export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
-export LD_LIBRARY_PATH=/share/apps/CENTOS7/amber/miniconda3/lib:$LD_LIBRARY_PATH +
-which mpirun python conda+
  
-# cuda 9.2 +[root@cottontail2 slurm-22.05.2]# which gcc mpicc nvcc 
-export CUDAHOME=/usr/local/n37-cuda-9.2 +/opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc 
-export PATH=/usr/local/n37-cuda-9.2/bin:$PATH +/opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/bin/mpicc 
-export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH +/usr/local/cuda/bin/nvcc
-which nvcc+
  
- export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH 
- export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH 
- which mpirun 
  
 +./configure \
 +--prefix=/usr/local/slurm-22.05.2 \
 +--sysconfdir=/usr/local/slurm-22.05.2/etc \
 +--with-nvml=/usr/local/cuda
 +make
 +make install
  
 +export PATH=/usr/local/slurm/bin:$PATH
 +export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH
 +
 +[root@cottontail2 slurm-22.05.2]# find /usr/local/slurm-22.05.2/ -name auth_munge.so
 +/usr/local/slurm-22.05.2/lib/slurm/auth_munge.so
 +
 +</code>
 +
 +
 +==== SLURM installation ====
 +
 +Configured and compiled on ''greentail52'' despite not having gpus...only library manager is needed (nvml)
 +
 +<code>
 +
 +# cuda 9.2 ... 
 +# installer found /usr/local/cuda on ''greentail''
 +
 +# just in case
 +export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
 +export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
 +which mpirun
 +
 +# /usr/local/slurm is symbolic link to slurm-21.08.1
 ./configure \ ./configure \
 --prefix=/usr/local/slurm-21.08.1 \ --prefix=/usr/local/slurm-21.08.1 \
 --sysconfdir=/usr/local/slurm-21.08.1/etc \ --sysconfdir=/usr/local/slurm-21.08.1/etc \
---with-nvml=/usr/local/n37-cuda-9.2 \ + | tee -a install.log 
--with-hdf5=no | tee -a install.log +# skip # --with-nvml=/usr/local/n37-cuda-9.2 \ 
 +# skip # -with-hdf5=no  \
 # known hdf5 library problem when including --with-nvml # known hdf5 library problem when including --with-nvml
  
Line 207: Line 238:
 config.status: creating src/plugins/gpu/nvml/Makefile config.status: creating src/plugins/gpu/nvml/Makefile
  
 +====
 +Libraries have been installed in:
 +   /usr/local/slurm-21.08.1/lib/slurm
  
 +If you ever happen to want to link against installed libraries
 +in a given directory, LIBDIR, you must either use libtool, and
 +specify the full pathname of the library, or use the '-LLIBDIR'
 +flag during linking and do at least one of the following:
 +   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
 +     during execution
 +   - add LIBDIR to the 'LD_RUN_PATH' environment variable
 +     during linking
 +   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
 +   - have your system administrator add LIBDIR to '/etc/ld.so.conf'
 +====
 +
 +# for now
 export PATH=/usr/local/slurm/bin:$PATH export PATH=/usr/local/slurm/bin:$PATH
 export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH
  
-From job completions file, JOB #3+</code> 
 + 
 + 
 +==== General Accounting ==== 
 + 
 +<code> 
 + 
 +From job completions file, JOB #3, convert Start and End times to epoch seconds
  
 StartTime=2021-10-06T14:32:37 EndTime=2021-10-06T14:37:40 StartTime=2021-10-06T14:32:37 EndTime=2021-10-06T14:37:40
Line 226: Line 280:
  
  
-**Full Version Slurm Config Tool**+==== Slurm Config Tool ==== 
 + 
 +  lets start with this file and build up/out
  
 <code> <code>
Line 376: Line 432:
 # #
 # COMPUTE NODES # COMPUTE NODES
-NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=12 State=UNKNOWN+NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=State=UNKNOWN
 # #
 # #
cluster/207.txt · Last modified: 2023/10/27 14:47 by hmeij07