User Tools

Site Tools


cluster:207

This is an old revision of the document!



Back

Slurm Test Env

Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on greentail52 with a some compute nodes.

SLURM documentation

# main page
https://slurm.schedmd.com/

# Slurm Quick Start User Guide
https://slurm.schedmd.com/quickstart.html
https://slurm.schedmd.com/tutorials.html
  
# Slurm Quick Start Administrator Guide
https://slurm.schedmd.com/quickstart_admin.html

ldconfig -n /usr/lib64 to find libslurm.so
support for accounting will be built if the mysql development library is present
the host's name is "mcri" and the name "emcri" private network communication
nodes can be in more than one partition 
extensive sample configuration file is provided in etc/slurm.conf.example
at least these:
# Sample /etc/slurm.conf for mcr.llnl.gov
scrontrol examples...

https://slurm.schedmd.com/slurm.conf.html
section: node configuration
Features (hasGPU, hasRTX5000)
are intended to be used to filter nodes eligible to run jobs via the --constraint argument.
comma-delimited list of arbitrary strings indicative of some characteristic 
associated with the node. 
Feature=hasLocalscratch2T,
second node list for 5T if in same queue
GRES (boolean)
A comma-delimited list of generic resources specifications for a node. 
The format is: "<name>[:<type>][:no_consume]:<number>[K|M|G]". 
"Gres=gpu:tesla:1,cpu:haswell:2"
section: partition configuration
DisableRootJobs=YES
Nodes="[n110-n111],[n79-n90]"

https://slurm.schedmd.com/gres.html#GPU_Management
setting up gres.conf

https://slurm.schedmd.com/mc_support.html
multi-core, multi-thread
--sockets-per-node=S 	Number of sockets in a node to dedicate to a job (minimum)
--cores-per-socket=C 	Number of cores in a socket to dedicate to a job (minimum)
--threads-per-core=T 	Minimum number of threads in a core to dedicate to a job. 
-B S[:C[:T]] 	Combined shortcut option for --sockets-per-node, --cores-per_cpu, --threads-per_core 
Total cpus requested = (Nodes) x (S x C x T)

StateSaveLocation (useful for upgrades or downgrades)
upgrade once a year, head node first then nodes
to install the new version of Slurm to a unique directory and use a 
symbolic link to point the directory in your PATH to the version of 
Slurm you would like to use
(how does OpenHPC handle this?)
MPI libraries with Slurm integration should be recompiled, 
libslurm.so increases major releases
beware while upgrading of SlurmdTimeout and SlurmctldTimeout values 
(scontrol change them)

https://slurm.schedmd.com/configurator.html
full version of config tool, 
skip sockets (+cpu+physical+logical, core+mem, no backfill)
see below, first attempt

https://slurm.schedmd.com/configless_slurm.html
more traffic, stick to local files in sync cluster wide

https://slurm.schedmd.com/priority_multifactor.html
fair share (requires database), decay, reset monthly, favor small jobs
PriorityType=priority/multifactor
openhpc does have slurm-slurmdbd-ohpc rpm, it's just a service daemon, skip

https://slurm.schedmd.com/sched_config.html
The backfill scheduling plugin is loaded by default
SchedulerType=sched/backfill

https://slurm.schedmd.com/cons_res.html
exclusive use default policy in Slurm can result in inefficient utilization 
SelectType=select/cons_tres (includes all con_res options, adds gpu options)
set SlurmctldLogFile and SlurmdLogFile locations (else syslog)

https://slurm.schedmd.com/accounting.html
sacct (text file), sreport (database), settings below for minimal overhead
JobCompType=jobcomp/filetxt and JobCompLoc=/var/log/slurm/job_completions
logrotate, Send a SIGUSR2 signal to the slurmctld daemon after moving the files

XSEDE Resources

What is XSEDE
https://portal.xsede.org/documentation-overview

Advanced Slurm
https://cvw.cac.cornell.edu/SLURM/default

MUNGE installation

download latest release https://dun.github.io/munge/
from https://github.com/dun/munge/releases/tag/munge-0.5.14o

dun.gpg
munge-0.5.14.tar.xz
munge-0.5.14.tar.xz.asc

stage in tmp/ then build RPM file
https://github.com/dun/munge/wiki/Installation-Guide

rpmbuild -tb munge-0.5.14.tar.xz

# try on n78 first, as root

Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-0.5.14-1.el7.x86_64.rpm
Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-devel-0.5.14-1.el7.x86_64.rpm
Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-libs-0.5.14-1.el7.x86_64.rpm

# as root
cd /zfshomes/hmeij/rpmbuild/RPMS/x86_64/
rpm -ivh munge-0.5.14-1.el7.x86_64.rpm \
 munge-devel-0.5.14-1.el7.x86_64.rpm munge-libs-0.5.14-1.el7.x86_64.rpm

# create a key on greentail52, copy to n78 the test node
[root@greentail52 ~]# sudo -u munge /usr/sbin/mungekey --verbose
mungekey: Info: Created "/etc/munge/munge.key" with 1024-bit key
[root@greentail52 ~]# ls -l /etc/munge/munge.key
-rw------- 1 munge munge 128 Oct  5 08:28 /etc/munge/munge.key
[root@greentail52 ~]# scp -p /etc/munge/munge.key n78:/etc/munge/
munge.key                                     100%  128   223.8KB/s   00:00 

systemctl enable munge
systemctl start munge

 munge -n
 munge -n | unmunge
 munge -n -t 10 | ssh n78 unmunge

# remote decode working?
[root@greentail52 ~]# munge -n -t 10 | ssh n78 unmunge
STATUS:          Success (0)
ENCODE_HOST:     greentail52 (192.168.102.251)
ENCODE_TIME:     2021-10-05 09:27:45 -0400 (1633440465)
DECODE_TIME:     2021-10-05 09:27:44 -0400 (1633440464)
TTL:             10
CIPHER:          aes128 (4)
MAC:             sha256 (5)
ZIP:             none (0)
UID:             root (0)
GID:             root (0)
LENGTH:          0

# file locations
[root@greentail52 ~]# munged --help 
  -S, --socket=PATH        Specify local socket [/run/munge/munge.socket.2]
  --key-file=PATH          Specify key file [/etc/munge/munge.key]
  --log-file=PATH          Specify log file [/var/log/munge/munged.log]
  --pid-file=PATH          Specify PID file [/run/munge/munged.pid]
  --seed-file=PATH         Specify PRNG seed file [/var/lib/munge/munged.seed]

SLURM installation

#source /share/apps/CENTOS7/amber/miniconda3/etc/profile.d/conda.sh
#export PATH=/share/apps/CENTOS7/amber/miniconda3/bin:$PATH
#export LD_LIBRARY_PATH=/share/apps/CENTOS7/amber/miniconda3/lib:$LD_LIBRARY_PATH
#which mpirun python conda

# cuda 9.2 ... configure finds /usr/local/cuda which points to n37-cuda-9.2
#export CUDAHOME=/usr/local/n37-cuda-9.2
#export PATH=/usr/local/n37-cuda-9.2/bin:$PATH
#export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH
#which nvcc

# just in case
export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
which mpirun

# /usr/local/slurm is link to slurm-21.08.1
./configure \
--prefix=/usr/local/slurm \
--sysconfdir=/usr/local/slurm/etc \
tee -a install.log
# not --with-nvml=/usr/local/n37-cuda-9.2 \
# not -with-hdf5=no | \


# known hdf5 library problem when including --with-nvml

grep -i nvml install.log
config.status: creating src/plugins/gpu/nvml/Makefile


export PATH=/usr/local/slurm/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH

From job completions file, JOB #3

StartTime=2021-10-06T14:32:37 EndTime=2021-10-06T14:37:40

date --date='2021/10/06 14:32:37' +"%s"
1633545157

date --date='2021/10/06 14:37:40' +"%s"
1633545460

EndTime - StartTime = 1633545460-1633545157 = 303 seconds

Full Version Slurm Config Tool

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=slurmcluster
SlurmctldHost=cottontail2
#SlurmctldHost=
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
Epilog=/share/apps/lsf/slurm-epilog.sh
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=67043328
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=lua
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=10000
#MaxStepCount=40000
#MaxTasksPerNode=512
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/linuxproc
Prolog=/share/apps/lsf/slurm-prolog.sh
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
SrunEpilog=/share/apps/lsf/slurm-epilog.sh
SrunProlog=/share/apps/lsf/slurm-prolog.sh
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskEpilog=/share/apps/lsf/slurm-epilog.sh
TaskPlugin=task/affinity
TaskProlog=/share/apps/lsf/slurm-prolog.sh
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=300
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/builtin
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
#
#
# JOB PRIORITY
#PriorityFlags=
PriorityType=priority/basic
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=0
#PriorityCalcPeriod=
#PriorityFavorSmall=YES
#PriorityMaxAge=14-0
#PriorityUsageResetPeriod=MONTHLY
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
#AccountingStoreFlags=
#JobCompHost=
JobCompLoc=/var/log/slurmjobs.txt
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/filetxt
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#DebugFlags=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=12 State=UNKNOWN
#
#
# PARTITIONS
PartitionName=test Nodes=n[110-111] Default=YES MaxTime=INFINITE State=UP
#
#


Back

cluster/207.1634047666.txt.gz · Last modified: 2021/10/12 10:07 by hmeij07