User Tools

Site Tools



Getting Started with Slurm Guide

  • The following resources are now (7/1/2022) managed by Slurm on our new head/login node
  • You must ssh directly to this server (like you do connecting to greentail52) via VPN
    • ssh
  • If the environment is setup correctly 'which sinfo' should return /usr/local/slurm/bin/sinfo
[hmeij@cottontail2 ~]$ which sinfo

[hmeij@cottontail2 ~]$ sinfo
mwgpu        up   infinite      2   idle n[34-35]
tinymem      up   infinite      5   idle n[55-59]
mw128        up   infinite      5   idle n[73-77]
amber128     up   infinite      1   idle n78
exx96        up   infinite      5   idle n[86-90]
test*        up   infinite      2   idle n[100-101]
  • July 2022 is designated testing period
  • August 2022 is designated migration period
  • Queues hp12 and mwgpu (centos6) will be serviced by Openlava, not Slurm

Quick Start Slurm Guide

Jump to the Rocky8/CentOs7 script templates listed in the menu of this page, top right.

There is also detailed information on Amber20/Amber22 on this page with script examples.

  • Tada new head node

Basic Commands

# sorta like bqueues
 sinfo -l

# more node info
 sinfo -lN

# sorta like bsub

# sorta like bjobs

# sorta like bhosts -l
 scontrol show node n78

# sorta like bstop/bresume
scontrol suspend job 1000001
scontrol resume job 1000001 

# sorta like bhist -l
 scontrol show job 1000002

# sorta like bkill
 scancel 1000003



You must request resources, that is for example number of cpu cores or which gpu model to use. If you do not request resources, Slurm will assume you need all the node's resources and thus prevent other jobs from running on that node.


Some common examples are:

NODE control
#SBATCH -N 1     # default, nr of nodes

CPU control
#SBATCH -n 8     # tasks=S*C*T
#SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
#SBATCH --mem=250           # needed to override oversubscribe
#SBATCH --ntasks-per-node=1 # perhaps needed to override oversubscribe
#SBATCH --cpus-per-task=1   # needed to override oversubscribe

GPU control
#SBATCH --cpus-per-gpu=1                  # needed to override oversubscribe
#SBATCH --mem-per-gpu=7168                # needed to override oversubscribe
#SBATCH --gres=gpu:geforce_gtx_1080_ti:1  # n[78], amber128
#SBATCH --gres=gpu:geforce_rtx_2080_s:1   # n[79-90], exx96
#SBATCH --gres=gpu:quadro_rtx_5000:1      # n[100-101], test
#SBATCH --gres=gpu:tesla_k20m:1           # n[33-37], mwgpu

Partition control
#SBATCH --partition=mw128
#SBATCH --nodelist=n74

Globbing queues, based on Priority/Weight (view output of 'sinfo -lN'
srun --partition=exx96,amber128,mwgpu  --mem=1024  --gpus=1   sleep 60 &

Pending Jobs

I keep having to inform users that with -n 1 and -cpu 1 your job can still go in pending state because user forgot to reserve memory … so silly slurm assumes your job needs all the node's memory. Here is my template then

FirstName, your jobs are pending because you did not request memory 
and if not then slurm assumes you need all memory, silly. 
Command "scontrol show job JOBID" will reveal ...

JobId=1062052 JobName=3a_avgHbond_CPU
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:1:1
   TRES=cpu=1,mem=191047M,node=1,billing=1    <---------

I looked (command "ssh n?? top -u username -b -n 1", look for the VIRT value) 
and you need less than 1G per job so with --mem=1024 and n=1 and cpu=1 
you should be able to load 48 jobs onto n100. 
Consult output of command "sinfo -lN"


Slurm has a builtin MPI flavor. I suggest you do not rely on it. The documentation states that on major release upgrades the library is not backwards compatible. All software using this library would need to be recompiled.

There is a handy parallel job launcher which may be of use, it is called srun. srun commands can be embedded in a job submission script but it can also be used interactively to test commands out. The submmited job will have a single JOBPID and launch multiple tasks.

$ srun --partition=mwgpu -n 4 -B 1:4:1 --mem=1024 sleep 60 &
$ squeue

For more details on srun consult

MPI Stacks

For MPI development and runtime support, OpenHPC provides pre-packaged builds for a variety of MPI families and transport layers. OpenHPC 2.x introduces the use of two related transport layers for the MPICH and OpenMPI builds that support a variety of underlying fabrics: UCX (Unified Communication X) and OFI (OpenFabrics interfaces). Both versions support Ethernet, Infiniband and Omni-Path. We do no use the latter two fabrics (although we do have some Infiniband switches but do not custom compile for it).

  • openmpi4-gnu9-ohpc # ofi & ucs
  • mpich-ofi-gnu9-ohpc # ofi only
  • mpich-ucx-gnu9-ohpc # ucx only

Contents of these packages can be found at /zfshomes/hmeij/openhpc


OpenHPC also provides compatible builds for use with the compilers and MPI stack included in newer versions of the Intel® OneAPI HPC Toolkit (using the classic compiler variants).

  • intel-oneapi-toolkit-release-ohpc
  • intel-compilers-devel-ohpc
  • intel-mpi-devel-ohpc

Contents of these packages can be found at /zfshomes/hmeij/openhpc
Consult the section below.


Switching to Intel's OneAPI toolchain requires swapping to Intel's gnu9 compiler.

[hmeij@cottontail2 ~]$ module load intel/2022.0.2
Loading compiler version 2022.0.2
Loading tbb version 2021.5.1
Loading compiler-rt version 2022.0.2
Loading oclfpga version 2022.0.2
  Load "debugger" to debug DPC++ applications with the gdb-oneapi debugger.
  Load "dpl" for additional DPC++ APIs:
Loading mkl version 2022.0.2

Lmod has detected the following error: 
You can only have one compiler module loaded at a time.
You already have gnu9 loaded.
To correct the situation, please execute the following command:

  $ module swap gnu9 intel/2022.0.2

# after the swap we observe 

[hmeij@cottontail2 ~]$ which icc icx mpicc ifort ifx

# more on OneAPI and its variety of compilers

# debugger module
[hmeij@cottontail2 ~]$ module load debugger
Loading debugger version 2021.5.0


You can continue to rely on PATH/LD_LIBRARY_PATH settings to control the environment. This also implies your job should run under Openlava or Slurm with proper scheduler tags. With the new head node deployment of OpenHPC we'll introduce modules to control the environment for newly installed software.

The default developer environment is setup at login is but a few modules.

[hmeij@cottontail2 ~]$ module list

Currently Loaded Modules:
  1) autotools   3) gnu9/9.4.0    5) ucx/1.11.2         7) openmpi4/4.1.1
  2) prun/2.2    4) hwloc/2.5.0   6) libfabric/1.13.0   8) ohpc

[hmeij@cottontail2 ~]$ which gcc mpicc nvcc
/usr/local/cuda/bin/nvcc # cuda 11.6

All modules available to you can be queried

[hmeij@cottontail2 ~]$ module avail

------------------- /opt/ohpc/pub/moduledeps/gnu9-openmpi4 -------------
   adios/1.13.1     netcdf-cxx/4.3.1        py3-scipy/1.5.1
   boost/1.76.0     netcdf-fortran/4.5.3    scalapack/2.1.0
   dimemas/5.4.2    netcdf/4.7.4            scalasca/2.5
   example2/1.0     omb/5.8                 scorep/6.0
   extrae/3.7.0     opencoarrays/2.9.2      sionlib/1.7.4
   fftw/3.3.8       petsc/3.16.1            slepc/3.16.0
   hypre/2.18.1     phdf5/1.10.8            superlu_dist/6.4.0
   imb/2019.6       pnetcdf/1.12.2          tau/2.29
   mfem/4.3         ptscotch/6.0.6          trilinos/13.2.0
   mumps/5.2.1      py3-mpi4py/3.0.3

------------------------- /opt/ohpc/pub/moduledeps/gnu9 ----------------
   R/4.1.2          mpich/3.4.2-ofi         plasma/2.8.0
   gsl/2.7          mpich/3.4.2-ucx  (D)    py3-numpy/1.19.5
   hdf5/1.10.8      mvapich2/2.3.6          scotch/6.0.6
   impi/2021.5.1    openblas/0.3.7          superlu/5.2.1
   likwid/5.0.1     openmpi4/4.1.1   (L)
   metis/5.1.0      pdtoolkit/3.25.1

--------------------------- /opt/ohpc/pub/modulefiles -------------------
   EasyBuild/4.5.0          hwloc/2.5.0      (L)    prun/2.2          (L)
   autotools         (L)    intel/2022.0.2          singularity/3.7.1
   charliecloud/0.15        libfabric/1.13.0 (L)    ucx/1.11.2        (L)
   cmake/3.21.3             ohpc             (L)    valgrind/3.18.1
   example1/1.0             os
   gnu9/9.4.0        (L)    papi/5.7.0

----------------------- /share/apps/CENTOS8/ohpc/modulefiles ------------
   amber/20    cuda/11.6    hello-mpi/1.0    hello/1.0    miniconda3/py39

   D:  Default Module
   L:  Module is loaded

If the avail list is too long consider trying:

"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

In addition you can request and query Easybuild modules and tooichains

# "use" them
[hmeij@cottontail2 ~]$ module use /sanscratch/CENTOS8/easybuild/ohpc/modules/all
[hmeij@cottontail2 ~]$ module avail

-------------------- /sanscratch/CENTOS8/easybuild/ohpc/modules/all ----------------------
   Autoconf/2.69-GCCcore-10.2.0                  UnZip/6.0-GCCcore-10.2.0
   Automake/1.16.2-GCCcore-10.2.0                XZ/5.2.5-GCCcore-10.2.0
   Autotools/20200321-GCCcore-10.2.0             binutils/2.35-GCCcore-10.2.0
   Bison/3.5.3                                   binutils/2.35                       (D)
   Bison/3.7.1-GCCcore-10.2.0                    bzip2/1.0.8-GCCcore-10.2.0
   Bison/3.7.1                            (D)    cURL/7.72.0-GCCcore-10.2.0
   Boost.Python/1.74.0-GCC-10.2.0                expat/2.2.9-GCCcore-10.2.0
   Boost/1.74.0-GCC-10.2.0                       flex/2.6.4-GCCcore-10.2.0
   CMake/3.18.4-GCCcore-10.2.0                   flex/2.6.4                          (D)
   CUDA/11.1.1-GCC-10.2.0                        fosscuda/2020b
   CUDAcore/11.1.1                               gcccuda/2020b
   Check/0.15.2-GCCcore-10.2.0                   gettext/0.21
   DB/18.1.40-GCCcore-10.2.0                     gompic/2020b
   Eigen/3.3.8-GCCcore-10.2.0                    groff/1.22.4-GCCcore-10.2.0
   FFTW/3.3.8-gompic-2020b                       help2man/1.47.16-GCCcore-10.2.0
   GCC/10.2.0                                    hwloc/2.2.0-GCCcore-10.2.0
   GCCcore/10.2.0                                hypothesis/5.41.2-GCCcore-10.2.0
   GDRCopy/2.1-GCCcore-10.2.0-CUDA-11.1.1        libarchive/3.4.3-GCCcore-10.2.0
   GMP/6.2.0-GCCcore-10.2.0                      libevent/2.1.12-GCCcore-10.2.0
   M4/1.4.18-GCCcore-10.2.0                      libfabric/1.11.0-GCCcore-10.2.0
   M4/1.4.18                              (D)    libffi/3.3-GCCcore-10.2.0
   Mako/1.1.3-GCCcore-10.2.0                     libpciaccess/0.16-GCCcore-10.2.0
   OpenBLAS/0.3.12-GCC-10.2.0                    libreadline/8.0-GCCcore-10.2.0
   OpenMPI/4.0.5-gcccuda-2020b                   libtool/2.4.6-GCCcore-10.2.0
   PMIx/3.1.5-GCCcore-10.2.0                     libxml2/2.9.10-GCCcore-10.2.0
   Perl/5.32.0-GCCcore-10.2.0-minimal            makeinfo/6.7-GCCcore-10.2.0-minimal
   Perl/5.32.0-GCCcore-10.2.0             (D)    ncurses/6.2-GCCcore-10.2.0
   PyCUDA/2020.1-fosscuda-2020b                  ncurses/6.2                         (D)
   Python/2.7.18-GCCcore-10.2.0                  numactl/2.0.13-GCCcore-10.2.0
   Python/3.8.6-GCCcore-10.2.0            (D)    pkg-config/0.29.2-GCCcore-10.2.0
   SQLite/3.33.0-GCCcore-10.2.0                  pybind11/2.6.0-GCCcore-10.2.0
   ScaLAPACK/2.1.0-gompic-2020b                  xorg-macros/1.19.2-GCCcore-10.2.0
   SciPy-bundle/2020.11-fosscuda-2020b           zlib/1.2.11-GCCcore-10.2.0
   Tcl/8.6.10-GCCcore-10.2.0                     zlib/1.2.11                         (D)


There are two container flavors in OpenHPC v2.4 which can be loaded via module


How to launch docker containers with charliecloud (NGC catalog)

Nvidia NGC Containers: We built libnvidia-container to make it easy to run CUDA applications inside containers

Slurm's commands salloc, srun and sbatch (version 21.08+) have the '–container' parameter.

Rocky8 Slurm Template

In this section is posted a Slurm submit job template that runs Amber20's pmemd.cuda and pmemd.MPI. Depending on the program invoked, certain parameters need to be changed.

For pmemd.cuda we only need one gpu, one cpu core and thread (-n 1 , -B 1:1:1), some memory (nothing larger than gpu memory is needed), specify the gpu model, load the module, and invoke the program. As is depicted below.

For pmemd.MPI we do not specify any gpu parameters, request 8 cpu cores (-n 8, -B 2:4:1), request or not memory, load the module and invoke program with a machinefile specified.

Both modes run on one node (-N 1)

  • /zfshomes/hmeij/slurm/run.rocky for tinymem, mw128, amber128, test queues
# [found at XStream]
# Slurm will IGNORE all lines after the FIRST BLANK LINE,
# even the ones containing #SBATCH.
# Always put your SBATCH parameters at the top of your batch script.
# Took me days to find ... really silly behavior -Henk
#SBATCH --job-name="test"
#SBATCH --output=out   # or both in default file
#SBATCH --error=err    # slurm-$SLURM_JOBID.out
#SBATCH --mail-type=END
# NODE control
#SBATCH -N 1     # default, nodes
# CPU control
#SBATCH -n 1     # tasks=S*C*T
#SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
###SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
# GPU control
#SBATCH --cpus-per-gpu=1
#SBATCH --mem-per-gpu=7168 
###SBATCH --gres=gpu:geforce_gtx_1080_ti:1  # n78
#SBATCH --gres=gpu:quadro_rtx_5000:1  # n[100-101]
# Node control
###SBATCH --partition=mw128
###SBATCH --nodelist=n74

# unique job scratch dirs

### AMBER20 works via slurm's imaged nodes, test and amber128  queues
#source /share/apps/CENTOS8/ohpc/software/amber/20/
# OR #
module load amber/20
# check
which nvcc gcc mpicc pmemd.cuda

# stage the data
cp -r ~/sharptail/* .

# set if needed. try stacking on same gpu, max=4
###export CUDA_VISIBLE_DEVICES=`gpu-free | sed 's/,/\n/g' | shuf | head -1`

# for amber gpu, select gpu model
mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \
-np  1 \
pmemd.cuda \
-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd

# for amber cpu, select partition
#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \
#-np  8 \
#pmemd.MPI \
#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd

scp mdout.$SLURM_JOB_ID ~/tmp/


          Amber 20 PMEMD                              2020

| PMEMD implementation of SANDER, Release 18

|  Compiled date/time: Wed Apr  6 09:56:06 2022
| Run on 06/30/2022 at 10:09:57

|   Executable path: pmemd.cuda
| Working directory: /home/localscratch/1000608
|          Hostname: n101


|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
|                    Version 18.0.0


|         ns/day =      11.75   seconds/ns =    7355.31
|  Total wall time:         589    seconds     0.16 hours

CentOS7 Slurm Template

In this job template I have it setup to run pmemd.MPI but could also invoke pmemd.cuda with proper parameter settings. On queues mwgpu and exx96 amber[16,20] are local disk CentOS7 software installations. Amber16 will not run on Rocky8 (tried it but forgot error message…we can expect problems like this, hence testing!).

Note also that we're running mwgpu's K20 cuda version 9.2 on exx96 queue (default cuda version 10.2). Not proper but it works. Hence this script will run on both queues. Oh, now I remember, it is that amber16 was compiled with cuda 9.2 drivers which are supported in cuda 10.x but not in cuda 11.x. So Amber 16, if needed, would need to be compiled in Rocky8 environment (and may work like amber20 module).

  • /zfshomes/hmeij/slurm/run.centos for mwgpu, exx96 queues
# [found at XStream]
# Slurm will IGNORE all lines after the FIRST BLANK LINE,
# even the ones containing #SBATCH.
# Always put your SBATCH parameters at the top of your batch script.
# Took me days to find ... really silly behavior -Henk
#SBATCH --job-name="test"
#SBATCH --output=out   # or both in default file
#SBATCH --error=err    # slurm-$SLURM_JOBID.out
##SBATCH --mail-type=END
# NODE control
#SBATCH -N 1     # default, nodes
# CPU control
#SBATCH -n 8     # tasks=S*C*T
###SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
#SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
# GPU control
###SBATCH --cpus-per-gpu=1
###SBATCH --mem-per-gpu=7168
###SBATCH --gres=gpu:tesla_k20m:1  # n[33-37]
###SBATCH --gres=gpu:geforce_rtx_2080_s:1  # n[79-90]
# Node control
#SBATCH --partition=exx96
#SBATCH --nodelist=n88

# may or may not be needed, centos7 login env
source $HOME/.bashrc  
which ifort           # should be the parallel studio 2016 version

# unique job scratch dirs

# amber20/cuda 9.2/openmpi good for n33-n37 and n79-n90
export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/n37-cuda-9.2
export PATH=/usr/local/n37-cuda-9.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="/usr/local/n37-cuda-9.2/lib:${LD_LIBRARY_PATH}"
export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH
which nvcc mpirun python

###source /usr/local/amber16/ # works via slurm's mwgpu
source /usr/local/amber20/ # works via slurm's exx96
# stage the data
cp -r ~/sharptail/* .

# not quite proper, may cause problems, look at run.rocky
# if it needs to be set (not needed with slurm)
###export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`

# for amber gpu, select gpu model
#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \
#-np  1 \
#pmemd.cuda \
#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd

# for amber cpu, select partition
mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \
-np  8 \
pmemd.MPI \
-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd

scp mdout.$SLURM_JOB_ID ~/tmp/

Output, takes forever to run…

[hmeij@cottontail2 slurm]$ ssh n88 top -u hmeij -b -n 1
top - 10:45:50 up 274 days,  1:40,  0 users,  load average: 2.69, 2.65, 2.67
Tasks: 516 total,   4 running, 512 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.1 us,  2.5 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 97336152 total,   368204 free,  3372968 used, 93594976 buff/cache
KiB Swap: 10485756 total, 10474748 free,    11008 used. 91700584 avail Mem 

416795 hmeij     20   0  113196   1572   1300 S   0.0  0.0   0:00.00 slurm_scr+
416798 hmeij     20   0  286668   6116   3452 S   0.0  0.0   0:00.06 mpirun
416810 hmeij     20   0  149548   4008   3048 S   0.0  0.0   0:09.06 pmemd.MPI
416811 hmeij     20   0  149548   4016   3048 S   0.0  0.0   0:08.92 pmemd.MPI
416812 hmeij     20   0  149548   4020   3048 S   0.0  0.0   0:08.92 pmemd.MPI
416813 hmeij     20   0  149548   4012   3048 S   0.0  0.0   0:08.94 pmemd.MPI
416814 hmeij     20   0  149548   4016   3052 S   0.0  0.0   0:08.83 pmemd.MPI
416815 hmeij     20   0  149548   4008   3048 S   0.0  0.0   0:08.96 pmemd.MPI
416816 hmeij     20   0  149548   4024   3052 S   0.0  0.0   0:08.91 pmemd.MPI
416817 hmeij     20   0  149548   4012   3048 S   0.0  0.0   0:08.91 pmemd.MPI
417748 hmeij     20   0  166964   2412   1092 S   0.0  0.0   0:00.00 sshd
417749 hmeij     20   0  162292   2512   1524 R   0.0  0.0   0:00.03 top


July 2022 is for testing… lots to learn!

Kudos to Abhilash and Colin for working our way through all this.


cluster/218.txt · Last modified: 2023/10/14 15:24 by hmeij07