\\ **[[cluster:0|Back]]** **Make sure munge/unmunge work between 1.3/2.4, that date is in sync (else you get error #16)** ===== Slurm Test Env ===== Getting a head start on our new login node plus two cpu+gpu compute node project. Hardware has been purchased but there is long delivery time. Meanwhile it makes sense to setup a standalone Slurm scheduler and do some testing and have as a backup. Slurm will be running on ''greentail52'' with a some compute nodes. This page just intended to keep documentation sources handy. Go to the **Users** page [[cluster:208|Slurm Test Env]] ==== SLURM documentation ==== # main page https://slurm.schedmd.com/ # Slurm Quick Start User Guide https://slurm.schedmd.com/quickstart.html https://slurm.schedmd.com/tutorials.html # Slurm Quick Start Administrator Guide https://slurm.schedmd.com/quickstart_admin.html ldconfig -n /usr/lib64 to find libslurm.so support for accounting will be built if the mysql development library is present the host's name is "mcri" and the name "emcri" private network communication nodes can be in more than one partition extensive sample configuration file is provided in etc/slurm.conf.example at least these: # Sample /etc/slurm.conf for mcr.llnl.gov scrontrol examples... https://slurm.schedmd.com/slurm.conf.html section: node configuration The node range expression can contain one pair of square brackets with a sequence of comma-separated numbers and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or "lx[15,18,32-33]") Features (hasGPU, hasRTX5000) are intended to be used to filter nodes eligible to run jobs via the --constraint argument. comma-delimited list of arbitrary strings indicative of some characteristic associated with the node. Feature=hasLocalscratch2T, second node list for 5T if in same queue GRES (boolean) A comma-delimited list of generic resources specifications for a node. The format is: "[:][:no_consume]:[K|M|G]". "Gres=gpu:tesla:1,cpu:haswell:2" section: partition configuration DisableRootJobs=YES Nodes="[n110-n111],[n79-n90]" https://slurm.schedmd.com/gres.html#GPU_Management setting up gres.conf give GPU jobs priority using the Multifactor Priority plugin: https://slurm.schedmd.com/priority_multifactor.html#tres PriorityWeightTRES=GRES/gpu=1000 example here: https://slurm.schedmd.com/SLUG19/Priority_and_Fair_Trees.pdf requires faishare thus the database https://slurm.schedmd.com/mc_support.html multi-core, multi-thread --sockets-per-node=S Number of sockets in a node to dedicate to a job (minimum) --cores-per-socket=C Number of cores in a socket to dedicate to a job (minimum) --threads-per-core=T Minimum number of threads in a core to dedicate to a job. -B S[:C[:T]] Combined shortcut option for --sockets-per-node, --cores-per_cpu, --threads-per_core Total cpus requested = (Nodes) x (S x C x T) StateSaveLocation (useful for upgrades or downgrades) upgrade once a year, head node first then nodes to install the new version of Slurm to a unique directory and use a symbolic link to point the directory in your PATH to the version of Slurm you would like to use (how does OpenHPC handle this?) MPI libraries with Slurm integration should be recompiled, libslurm.so increases major releases beware while upgrading of SlurmdTimeout and SlurmctldTimeout values (scontrol change them) https://slurm.schedmd.com/configurator.html full version of config tool, skip sockets (+cpu+physical+logical, core+mem, no backfill) see below, first attempt https://slurm.schedmd.com/configless_slurm.html more traffic, stick to local files in sync cluster wide https://slurm.schedmd.com/priority_multifactor.html fair share (requires database), decay, reset monthly, favor small jobs PriorityType=priority/multifactor openhpc does have slurm-slurmdbd-ohpc rpm, it's just a service daemon, skip https://slurm.schedmd.com/sched_config.html The backfill scheduling plugin is loaded by default SchedulerType=sched/backfill https://slurm.schedmd.com/cons_res.html exclusive use default policy in Slurm can result in inefficient utilization SelectType=select/cons_tres (includes all con_res options, adds gpu options) set SlurmctldLogFile and SlurmdLogFile locations (else syslog) https://slurm.schedmd.com/accounting.html sacct (text file), sreport (database), settings below for minimal overhead JobCompType=jobcomp/filetxt and JobCompLoc=/var/log/slurm/job_completions logrotate, Send a SIGUSR2 signal to the slurmctld daemon after moving the files XSEDE Resources What is XSEDE https://portal.xsede.org/documentation-overview Advanced Slurm https://cvw.cac.cornell.edu/SLURM/default ==== MUNGE installation ==== download latest release https://dun.github.io/munge/ from https://github.com/dun/munge/releases/tag/munge-0.5.14o dun.gpg munge-0.5.14.tar.xz munge-0.5.14.tar.xz.asc stage in tmp/ then build RPM file https://github.com/dun/munge/wiki/Installation-Guide rpmbuild -tb munge-0.5.14.tar.xz # try on n78 first, as root Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-0.5.14-1.el7.x86_64.rpm Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-devel-0.5.14-1.el7.x86_64.rpm Wrote: /zfshomes/hmeij/rpmbuild/RPMS/x86_64/munge-libs-0.5.14-1.el7.x86_64.rpm # as root cd /zfshomes/hmeij/rpmbuild/RPMS/x86_64/ rpm -ivh munge-0.5.14-1.el7.x86_64.rpm \ munge-devel-0.5.14-1.el7.x86_64.rpm munge-libs-0.5.14-1.el7.x86_64.rpm # create a key on greentail52, copy to n78 the test node [root@greentail52 ~]# sudo -u munge /usr/sbin/mungekey --verbose mungekey: Info: Created "/etc/munge/munge.key" with 1024-bit key [root@greentail52 ~]# ls -l /etc/munge/munge.key -rw------- 1 munge munge 128 Oct 5 08:28 /etc/munge/munge.key [root@greentail52 ~]# scp -p /etc/munge/munge.key n78:/etc/munge/ munge.key 100% 128 223.8KB/s 00:00 systemctl enable munge systemctl start munge munge -n munge -n | unmunge munge -n -t 10 | ssh n78 unmunge # remote decode working? [root@greentail52 ~]# munge -n -t 10 | ssh n78 unmunge STATUS: Success (0) ENCODE_HOST: greentail52 (192.168.102.251) ENCODE_TIME: 2021-10-05 09:27:45 -0400 (1633440465) DECODE_TIME: 2021-10-05 09:27:44 -0400 (1633440464) TTL: 10 CIPHER: aes128 (4) MAC: sha256 (5) ZIP: none (0) UID: root (0) GID: root (0) LENGTH: 0 # file locations [root@greentail52 ~]# munged --help -S, --socket=PATH Specify local socket [/run/munge/munge.socket.2] --key-file=PATH Specify key file [/etc/munge/munge.key] --log-file=PATH Specify log file [/var/log/munge/munged.log] --pid-file=PATH Specify PID file [/run/munge/munged.pid] --seed-file=PATH Specify PRNG seed file [/var/lib/munge/munged.seed] ==== SLURM installation Updated ==== export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH [root@cottontail2 slurm-22.05.2]# which gcc mpicc nvcc /opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc /opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/bin/mpicc /usr/local/cuda/bin/nvcc ./configure \ --prefix=/usr/local/slurm-22.05.2 \ --sysconfdir=/usr/local/slurm-22.05.2/etc \ --with-nvml=/usr/local/cuda make make install export PATH=/usr/local/slurm/bin:$PATH export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH [root@cottontail2 slurm-22.05.2]# find /usr/local/slurm-22.05.2/ -name auth_munge.so /usr/local/slurm-22.05.2/lib/slurm/auth_munge.so ==== SLURM installation ==== Configured and compiled on ''greentail52'' despite not having gpus...only library manager is needed (nvml) # cuda 9.2 ... # installer found /usr/local/cuda on ''greentail'' # just in case export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH which mpirun # /usr/local/slurm is symbolic link to slurm-21.08.1 ./configure \ --prefix=/usr/local/slurm-21.08.1 \ --sysconfdir=/usr/local/slurm-21.08.1/etc \ | tee -a install.log # skip # --with-nvml=/usr/local/n37-cuda-9.2 \ # skip # -with-hdf5=no \ # known hdf5 library problem when including --with-nvml grep -i nvml install.log config.status: creating src/plugins/gpu/nvml/Makefile ==== Libraries have been installed in: /usr/local/slurm-21.08.1/lib/slurm If you ever happen to want to link against installed libraries in a given directory, LIBDIR, you must either use libtool, and specify the full pathname of the library, or use the '-LLIBDIR' flag during linking and do at least one of the following: - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable during execution - add LIBDIR to the 'LD_RUN_PATH' environment variable during linking - use the '-Wl,-rpath -Wl,LIBDIR' linker flag - have your system administrator add LIBDIR to '/etc/ld.so.conf' ==== # for now export PATH=/usr/local/slurm/bin:$PATH export LD_LIBRARY_PATH=/usr/local/slurm/lib:$LD_LIBRARY_PATH ==== General Accounting ==== From job completions file, JOB #3, convert Start and End times to epoch seconds StartTime=2021-10-06T14:32:37 EndTime=2021-10-06T14:37:40 date --date='2021/10/06 14:32:37' +"%s" 1633545157 date --date='2021/10/06 14:37:40' +"%s" 1633545460 EndTime - StartTime = 1633545460-1633545157 = 303 seconds ==== Slurm Config Tool ==== * lets start with this file and build up/out # slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ClusterName=slurmcluster SlurmctldHost=cottontail2 #SlurmctldHost= # #DisableRootJobs=NO #EnforcePartLimits=NO Epilog=/share/apps/lsf/slurm-epilog.sh #EpilogSlurmctld= #FirstJobId=1 #MaxJobId=67043328 #GresTypes= #GroupUpdateForce=0 #GroupUpdateTime=600 #JobFileAppend=0 #JobRequeue=1 #JobSubmitPlugins=lua #KillOnBadExit=0 #LaunchType=launch/slurm #Licenses=foo*4,bar #MailProg=/bin/mail #MaxJobCount=10000 #MaxStepCount=40000 #MaxTasksPerNode=512 MpiDefault=none #MpiParams=ports=#-# #PluginDir= #PlugStackConfig= #PrivateData=jobs ProctrackType=proctrack/linuxproc Prolog=/share/apps/lsf/slurm-prolog.sh #PrologFlags= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= #RebootProgram= ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root SrunEpilog=/share/apps/lsf/slurm-epilog.sh SrunProlog=/share/apps/lsf/slurm-prolog.sh StateSaveLocation=/var/spool/slurmctld SwitchType=switch/none TaskEpilog=/share/apps/lsf/slurm-epilog.sh TaskPlugin=task/affinity TaskProlog=/share/apps/lsf/slurm-prolog.sh #TopologyPlugin=topology/tree #TmpFS=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 # # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 #HealthCheckInterval=0 #HealthCheckProgram= InactiveLimit=0 KillWait=30 #MessageTimeout=10 #ResvOverRun=0 MinJobAge=300 #OverTimeLimit=0 SlurmctldTimeout=300 SlurmdTimeout=300 #UnkillableStepTimeout=60 #VSizeFactor=0 Waittime=0 # # # SCHEDULING #DefMemPerCPU=0 #MaxMemPerCPU=0 #SchedulerTimeSlice=30 SchedulerType=sched/builtin SelectType=select/cons_tres SelectTypeParameters=CR_Core_Memory # # # JOB PRIORITY #PriorityFlags= PriorityType=priority/basic #PriorityType=priority/multifactor #PriorityDecayHalfLife=0 #PriorityCalcPeriod= #PriorityFavorSmall=YES #PriorityMaxAge=14-0 #PriorityUsageResetPeriod=MONTHLY #PriorityWeightAge= #PriorityWeightFairshare= #PriorityWeightJobSize= #PriorityWeightPartition= #PriorityWeightQOS= # # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 #AccountingStorageHost= #AccountingStoragePass= #AccountingStoragePort= AccountingStorageType=accounting_storage/none #AccountingStorageUser= #AccountingStoreFlags= #JobCompHost= JobCompLoc=/var/log/slurmjobs.txt #JobCompPass= #JobCompPort= JobCompType=jobcomp/filetxt #JobCompUser= #JobContainerType=job_container/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log #SlurmSchedLogFile= #SlurmSchedLogLevel= #DebugFlags= # # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #SuspendTimeout= #ResumeTimeout= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # # COMPUTE NODES NodeName=n[110-111] CPUs=2 RealMemory=192 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN # # # PARTITIONS PartitionName=test Nodes=n[110-111] Default=YES MaxTime=INFINITE State=UP # # \\ **[[cluster:0|Back]]**