cluster:207
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:207 [2021/10/11 17:11] – [Slurm Test] hmeij07 | cluster:207 [2023/10/27 18:47] (current) – hmeij07 | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| **[[cluster: | **[[cluster: | ||
| + | **Make sure munge/ | ||
| ===== Slurm Test Env ===== | ===== Slurm Test Env ===== | ||
| Line 7: | Line 8: | ||
| Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on '' | Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on '' | ||
| + | This page just intended to keep documentation sources handy. Go to the **Users** page [[cluster: | ||
| - | **SLURM documentation** | + | ==== SLURM documentation |
| < | < | ||
| Line 33: | Line 35: | ||
| https:// | https:// | ||
| section: node configuration | section: node configuration | ||
| + | |||
| + | The node range expression can contain one pair of square brackets with a sequence of comma-separated numbers and/or ranges of numbers separated by a " | ||
| + | |||
| Features (hasGPU, hasRTX5000) | Features (hasGPU, hasRTX5000) | ||
| are intended to be used to filter nodes eligible to run jobs via the --constraint argument. | are intended to be used to filter nodes eligible to run jobs via the --constraint argument. | ||
| Line 49: | Line 54: | ||
| https:// | https:// | ||
| setting up gres.conf | setting up gres.conf | ||
| + | |||
| + | give GPU jobs priority using the Multifactor Priority plugin: | ||
| + | https:// | ||
| + | PriorityWeightTRES=GRES/ | ||
| + | example here: https:// | ||
| + | requires faishare thus the database | ||
| https:// | https:// | ||
| Line 106: | Line 117: | ||
| - | ** MUNGE installation** | + | ==== MUNGE installation |
| < | < | ||
| Line 172: | Line 183: | ||
| </ | </ | ||
| - | ** SLURM installation | + | ==== SLURM installation |
| < | < | ||
| - | source / | + | export PATH=/usr/local/cuda/ |
| - | export PATH=/share/apps/CENTOS7/ | + | export LD_LIBRARY_PATH=/ |
| - | export LD_LIBRARY_PATH=/ | + | |
| - | which mpirun python conda | + | |
| - | # cuda 9.2 | + | [root@cottontail2 slurm-22.05.2]# which gcc mpicc nvcc |
| - | export CUDAHOME=/usr/local/n37-cuda-9.2 | + | /opt/ohpc/ |
| - | export PATH=/usr/local/n37-cuda-9.2/bin:$PATH | + | /opt/ohpc/pub/ |
| - | export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64: | + | / |
| - | which nvcc | + | |
| - | | ||
| - | | ||
| - | which mpirun | ||
| + | ./configure \ | ||
| + | --prefix=/ | ||
| + | --sysconfdir=/ | ||
| + | --with-nvml=/ | ||
| + | make | ||
| + | make install | ||
| + | export PATH=/ | ||
| + | export LD_LIBRARY_PATH=/ | ||
| + | |||
| + | [root@cottontail2 slurm-22.05.2]# | ||
| + | / | ||
| + | |||
| + | </ | ||
| + | |||
| + | |||
| + | ==== SLURM installation ==== | ||
| + | |||
| + | Configured and compiled on '' | ||
| + | |||
| + | < | ||
| + | |||
| + | # cuda 9.2 ... | ||
| + | # installer found / | ||
| + | |||
| + | # just in case | ||
| + | export PATH=/ | ||
| + | export LD_LIBRARY_PATH=/ | ||
| + | which mpirun | ||
| + | |||
| + | # / | ||
| ./configure \ | ./configure \ | ||
| --prefix=/ | --prefix=/ | ||
| --sysconfdir=/ | --sysconfdir=/ | ||
| - | --with-nvml=/ | + | | tee -a install.log |
| - | -with-hdf5=no | + | # skip # --with-nvml=/ |
| + | # skip # -with-hdf5=no | ||
| # known hdf5 library problem when including --with-nvml | # known hdf5 library problem when including --with-nvml | ||
| Line 203: | Line 238: | ||
| config.status: | config.status: | ||
| + | ==== | ||
| + | Libraries have been installed in: | ||
| + | / | ||
| + | If you ever happen to want to link against installed libraries | ||
| + | in a given directory, LIBDIR, you must either use libtool, and | ||
| + | specify the full pathname of the library, or use the ' | ||
| + | flag during linking and do at least one of the following: | ||
| + | - add LIBDIR to the ' | ||
| + | | ||
| + | - add LIBDIR to the ' | ||
| + | | ||
| + | - use the ' | ||
| + | - have your system administrator add LIBDIR to '/ | ||
| + | ==== | ||
| + | |||
| + | # for now | ||
| export PATH=/ | export PATH=/ | ||
| export LD_LIBRARY_PATH=/ | export LD_LIBRARY_PATH=/ | ||
| - | From job completions file, JOB #3 | + | </ |
| + | |||
| + | |||
| + | ==== General Accounting ==== | ||
| + | |||
| + | < | ||
| + | |||
| + | From job completions file, JOB #3, convert Start and End times to epoch seconds | ||
| StartTime=2021-10-06T14: | StartTime=2021-10-06T14: | ||
| Line 222: | Line 280: | ||
| - | **Full Version | + | ==== Slurm Config Tool ==== |
| + | |||
| + | | ||
| < | < | ||
| Line 372: | Line 432: | ||
| # | # | ||
| # COMPUTE NODES | # COMPUTE NODES | ||
| - | NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=12 State=UNKNOWN | + | NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN |
| # | # | ||
| # | # | ||
cluster/207.1633972300.txt.gz · Last modified: by hmeij07
