\\
**[[cluster:0|Back]]**
**Make sure munge/unmunge work between 1.3/2.4, that date is in sync (else you get error #16)**
===== Slurm Test Env =====
Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on ''greentail52'' with a some compute nodes.
This page just intended to keep documentation sources handy. Go to the **Users** page [[cluster:208|Slurm Test Env]]
==== SLURM documentation ====
# main page
https://slurm.schedmd.com/
# Slurm Quick Start User Guide
https://slurm.schedmd.com/quickstart.html
https://slurm.schedmd.com/tutorials.html
# Slurm Quick Start Administrator Guide
https://slurm.schedmd.com/quickstart_admin.html
ldconfig -n /usr/lib64 to find libslurm.so
support for accounting will be built if the mysql development library is present
the host's name is "mcri" and the name "emcri" private network communication
nodes can be in more than one partition
extensive sample configuration file is provided in etc/slurm.conf.example
at least these:
# Sample /etc/slurm.conf for mcr.llnl.gov
scrontrol examples...
https://slurm.schedmd.com/slurm.conf.html
section: node configuration
The node range expression can contain one pair of square brackets with a sequence of comma-separated numbers and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or "lx[15,18,32-33]")
Features (hasGPU, hasRTX5000)
are intended to be used to filter nodes eligible to run jobs via the --constraint argument.
comma-delimited list of arbitrary strings indicative of some characteristic
associated with the node.
Feature=hasLocalscratch2T,
second node list for 5T if in same queue
GRES (boolean)
A comma-delimited list of generic resources specifications for a node.
The format is: "[:][:no_consume]:[K|M|G]".
"Gres=gpu:tesla:1,cpu:haswell:2"
section: partition configuration
DisableRootJobs=YES
Nodes="[n110-n111],[n79-n90]"
https://slurm.schedmd.com/gres.html#GPU_Management
setting up gres.conf
give GPU jobs priority using the Multifactor Priority plugin:
https://slurm.schedmd.com/priority_multifactor.html#tres
PriorityWeightTRES=GRES/gpu=1000
example here: https://slurm.schedmd.com/SLUG19/Priority_and_Fair_Trees.pdf
requires faishare thus the database
https://slurm.schedmd.com/mc_support.html
multi-core, multi-thread
--sockets-per-node=S Number of sockets in a node to dedicate to a job (minimum)
--cores-per-socket=C Number of cores in a socket to dedicate to a job (minimum)
--threads-per-core=T Minimum number of threads in a core to dedicate to a job.
-B S[:C[:T]] Combined shortcut option for --sockets-per-node, --cores-per_cpu, --threads-per_core
Total cpus requested = (Nodes) x (S x C x T)
StateSaveLocation (useful for upgrades or downgrades)
upgrade once a year, head node first then nodes
to install the new version of Slurm to a unique directory and use a
symbolic link to point the directory in your PATH to the version of
Slurm you would like to use
(how does OpenHPC handle this?)
MPI libraries with Slurm integration should be recompiled,
libslurm.so increases major releases
beware while upgrading of SlurmdTimeout and SlurmctldTimeout values
(scontrol change them)
https://slurm.schedmd.com/configurator.html
full version of config tool,
skip sockets (+cpu+physical+logical, core+mem, no backfill)
see below, first attempt
https://slurm.schedmd.com/configless_slurm.html
more traffic, stick to local files in sync cluster wide
https://slurm.schedmd.com/priority_multifactor.html
fair share (requires database), decay, reset monthly, favor small jobs
PriorityType=priority/multifactor
openhpc does have slurm-slurmdbd-ohpc rpm, it's just a service daemon, skip
https://slurm.schedmd.com/sched_config.html
The backfill scheduling plugin is loaded by default
SchedulerType=sched/backfill
https://slurm.schedmd.com/cons_res.html
exclusive use default policy in Slurm can result in inefficient utilization
SelectType=select/cons_tres (includes all con_res options, adds gpu options)
set SlurmctldLogFile and SlurmdLogFile locations (else syslog)
https://slurm.schedmd.com/accounting.html
sacct (text file), sreport (database), settings below for minimal overhead
JobCompType=jobcomp/filetxt and JobCompLoc=/var/log/slurm/job_completions
logrotate, Send a SIGUSR2 signal to the slurmctld daemon after moving the files
XSEDE Resources
What is XSEDE
https://portal.xsede.org/documentation-overview
Advanced Slurm
https://cvw.cac.cornell.edu/SLURM/default
==== MUNGE installation ====
download latest release https://dun.github.io/munge/
from https://github.com/dun/munge/releases/tag/munge-0.5.14o
dun.gpg
munge-0.5.14.tar.xz
munge-0.5.14.tar.xz.asc
stage in tmp/ then build RPM file
https://github.com/dun/munge/wiki/Installation-Guide
rpmbuild -tb munge-0.5.14.tar.xz
# try on n78 first, as root
Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-0.5.14-1.el7.x86_64.rpm
Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-devel-0.5.14-1.el7.x86_64.rpm
Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-libs-0.5.14-1.el7.x86_64.rpm
# as root
cd /zfshomes/hmeij/rpmbuild/RPMS/x86_64/
rpm -ivh munge-0.5.14-1.el7.x86_64.rpm \
munge-devel-0.5.14-1.el7.x86_64.rpm munge-libs-0.5.14-1.el7.x86_64.rpm
# create a key on greentail52, copy to n78 the test node
[root@greentail52 ~]# sudo -u munge /usr/sbin/mungekey --verbose
mungekey: Info: Created "/etc/munge/munge.key" with 1024-bit key
[root@greentail52 ~]# ls -l /etc/munge/munge.key
-rw------- 1 munge munge 128 Oct 5 08:28 /etc/munge/munge.key
[root@greentail52 ~]# scp -p /etc/munge/munge.key n78:/etc/munge/
munge.key 100% 128 223.8KB/s 00:00
systemctl enable munge
systemctl start munge
munge -n
munge -n | unmunge
munge -n -t 10 | ssh n78 unmunge
# remote decode working?
[root@greentail52 ~]# munge -n -t 10 | ssh n78 unmunge
STATUS: Success (0)
ENCODE_HOST: greentail52 (192.168.102.251)
ENCODE_TIME: 2021-10-05 09:27:45 -0400 (1633440465)
DECODE_TIME: 2021-10-05 09:27:44 -0400 (1633440464)
TTL: 10
CIPHER: aes128 (4)
MAC: sha256 (5)
ZIP: none (0)
UID: root (0)
GID: root (0)
LENGTH: 0
# file locations
[root@greentail52 ~]# munged --help
-S, --socket=PATH Specify local socket [/run/munge/munge.socket.2]
--key-file=PATH Specify key file [/etc/munge/munge.key]
--log-file=PATH Specify log file [/var/log/munge/munged.log]
--pid-file=PATH Specify PID file [/run/munge/munged.pid]
--seed-file=PATH Specify PRNG seed file [/var/lib/munge/munged.seed]
==== SLURM installation Updated ====
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
[root@cottontail2 slurm-22.05.2]# which gcc mpicc nvcc
/opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc
/opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/bin/mpicc
/usr/local/cuda/bin/nvcc
./configure \
--prefix=/usr/local/slurm-22.05.2 \
--sysconfdir=/usr/local/slurm-22.05.2/etc \
--with-nvml=/usr/local/cuda
make
make install
export PATH=/usr/local/slurm/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH
[root@cottontail2 slurm-22.05.2]# find /usr/local/slurm-22.05.2/ -name auth_munge.so
/usr/local/slurm-22.05.2/lib/slurm/auth_munge.so
==== SLURM installation ====
Configured and compiled on ''greentail52'' despite not having gpus...only library manager is needed (nvml)
# cuda 9.2 ...
# installer found /usr/local/cuda on ''greentail''
# just in case
export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
which mpirun
# /usr/local/slurm is symbolic link to slurm-21.08.1
./configure \
--prefix=/usr/local/slurm-21.08.1 \
--sysconfdir=/usr/local/slurm-21.08.1/etc \
| tee -a install.log
# skip # --with-nvml=/usr/local/n37-cuda-9.2 \
# skip # -with-hdf5=no \
# known hdf5 library problem when including --with-nvml
grep -i nvml install.log
config.status: creating src/plugins/gpu/nvml/Makefile
====
Libraries have been installed in:
/usr/local/slurm-21.08.1/lib/slurm
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
- add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
during execution
- add LIBDIR to the 'LD_RUN_PATH' environment variable
during linking
- use the '-Wl,-rpath -Wl,LIBDIR' linker flag
- have your system administrator add LIBDIR to '/etc/ld.so.conf'
====
# for now
export PATH=/usr/local/slurm/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH
==== General Accounting ====
From job completions file, JOB #3, convert Start and End times to epoch seconds
StartTime=2021-10-06T14:32:37 EndTime=2021-10-06T14:37:40
date --date='2021/10/06 14:32:37' +"%s"
1633545157
date --date='2021/10/06 14:37:40' +"%s"
1633545460
EndTime - StartTime = 1633545460-1633545157 = 303 seconds
==== Slurm Config Tool ====
* lets start with this file and build up/out
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=slurmcluster
SlurmctldHost=cottontail2
#SlurmctldHost=
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
Epilog=/share/apps/lsf/slurm-epilog.sh
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=67043328
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=lua
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=10000
#MaxStepCount=40000
#MaxTasksPerNode=512
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/linuxproc
Prolog=/share/apps/lsf/slurm-prolog.sh
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
SrunEpilog=/share/apps/lsf/slurm-epilog.sh
SrunProlog=/share/apps/lsf/slurm-prolog.sh
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskEpilog=/share/apps/lsf/slurm-epilog.sh
TaskPlugin=task/affinity
TaskProlog=/share/apps/lsf/slurm-prolog.sh
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=300
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/builtin
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
#
#
# JOB PRIORITY
#PriorityFlags=
PriorityType=priority/basic
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=0
#PriorityCalcPeriod=
#PriorityFavorSmall=YES
#PriorityMaxAge=14-0
#PriorityUsageResetPeriod=MONTHLY
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
#AccountingStoreFlags=
#JobCompHost=
JobCompLoc=/var/log/slurmjobs.txt
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/filetxt
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#DebugFlags=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN
#
#
# PARTITIONS
PartitionName=test Nodes=n[110-111] Default=YES MaxTime=INFINITE State=UP
#
#
\\
**[[cluster:0|Back]]**