User Tools

Site Tools


cluster:218

This is an old revision of the document!



Back

Getting Started with Slurm Guide

  • The following resources are now (7/1/2022) managed by Slurm on our new head/login node cottontail2.wesleyan.edu
  • You must ssh directly to this server (like you do connecting to greentail52) via VPN
    • ssh username@cottontail2.wesleyan.edu
  • If the environment is setup correctly 'which sinfo' should return /usr/local/slurm/bin/sinfo
[hmeij@cottontail2 ~]$ which sinfo
/usr/local/slurm/bin/sinfo

[hmeij@cottontail2 ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
mwgpu        up   infinite      2   idle n[34-35]
tinymem      up   infinite      5   idle n[55-59]
mw128        up   infinite      5   idle n[73-77]
amber128     up   infinite      1   idle n78
exx96        up   infinite      5   idle n[86-90]
test*        up   infinite      2   idle n[100-101]
  • July 2022 is designated testing period
  • August 2022 is designated migration period
  • Queues hp12 and mwgpu (centos6) will be serviced by Openlava, not Slurm

Basic Commands

# sorta like bqueues
 sinfo -l

# more node info
 sinfo -lN

# sorta like bsub
 sbatch run.sh

# sorta like bjobs
 squeue

# sorta like bhosts -l
 scontrol show node n78

# sorta like bhist -l
 scontrol show job 1000002

# sorta like bkill
 scancel 1000003

Documentation

Resources

You must request resources, that is for example number of cpu cores or which gpu model to use. If you do not request resources, Slurm will assume you need all the node's resources and thus prevent other jobs from running on that node.

Some common examples are:

NODE control
#SBATCH -N 1     # default, nr of nodes

CPU control
#SBATCH -n 8     # tasks=S*C*T
#SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core

GPU control
#SBATCH --cpus-per-gpu=1
#SBATCH --mem-per-gpu=7168
#SBATCH --gres=gpu:geforce_gtx_1080_ti:1  # n[78], amber128
#SBATCH --gres=gpu:geforce_rtx_2080_s:1   # n[79-90], exx96
#SBATCH --gres=gpu:quadro_rtx_5000:1      # n[100-101], test
#SBATCH --gres=gpu:tesla_k20m:1           # n[33-37], mwgpu

Partition control
#SBATCH --partition=mw128
#SBATCH --nodelist=n74

Globbing queues, based on Priority/Weight (view output of 'sinfo -lN'
srun --partition=exx96,amber128,mwgpu  --mem=1024  --gpus=1   sleep 60 &

MPI

Slurm has a builtin MPI flavor. I suggest you do not rely on it. The documentation states that on major release upgrades the libslurm.so library is not backwards compatible. All software using this library would need to be recompiled.

There is a handy parallel job launcher which may be of use, it is called srun. srun commands can be embedded in a job submission script but it can also be used interactively to test commands out. The submmited job will have a single JOBPID and launch multiple tasks.

$ srun --partition=mwgpu -n 4 -B 1:4:1 --mem=1024 sleep 60 &
$ squeue

For more details on srun consult https://slurm.schedmd.com/srun.html

MPI Stacks

For MPI development and runtime support, OpenHPC provides pre-packaged builds for a variety of MPI families and transport layers. OpenHPC 2.x introduces the use of two related transport layers for the MPICH and OpenMPI builds that support a variety of underlying fabrics: UCX (Unified Communication X) and OFI (OpenFabrics interfaces). Both versions support Ethernet, Infiniband and Omni-Path. We do no use the latter two fabrics (although we do have some Infiniband switches but do not custom compile for it).

  • openmpi4-gnu9-ohpc # ofi & ucs
  • mpich-ofi-gnu9-ohpc # ofi only
  • mpich-ucx-gnu9-ohpc # ucx only

Contents of these packages can be found at /zfshomes/hmeij/openhpc

OneAPI

OpenHPC also provides compatible builds for use with the compilers and MPI stack included in newer versions of the Intel® OneAPI HPC Toolkit (using the classic compiler variants).

  • intel-oneapi-toolkit-release-ohpc
  • intel-compilers-devel-ohpc
  • intel-mpi-devel-ohpc

Contents of these packages can be found at /zfshomes/hmeij/openhpc
Consult the section below.

intel/2022.0.2

Switching to Intel's OneAPI toolchain requires swapping to Intel's gnu9 compiler.


[hmeij@cottontail2 ~]$ module load intel/2022.0.2
Loading compiler version 2022.0.2
Loading tbb version 2021.5.1
Loading compiler-rt version 2022.0.2
Loading oclfpga version 2022.0.2
  Load "debugger" to debug DPC++ applications with the gdb-oneapi debugger.
  Load "dpl" for additional DPC++ APIs: https://github.com/oneapi-src/oneDPL
Loading mkl version 2022.0.2


Lmod has detected the following error: 
You can only have one compiler module loaded at a time.
You already have gnu9 loaded.
To correct the situation, please execute the following command:

  $ module swap gnu9 intel/2022.0.2

# after the swap we observe 

[hmeij@cottontail2 ~]$ which icc icx mpicc ifort ifx
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/icc
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/icx
/opt/ohpc/pub/mpi/openmpi4-intel/4.1.1/bin/mpicc
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort
/opt/intel/oneapi/compiler/2022.0.2/linux/bin/ifx

# more on OneAPI and its variety of compilers
https://dokuwiki.wesleyan.edu/doku.php?id=cluster:203

# debugger module
[hmeij@cottontail2 ~]$ module load debugger
Loading debugger version 2021.5.0

Modules

You can continue to rely on PATH/LD_LIBRARY_PATH settings to control the environment. This also implies your job should run under Openlava or Slurm with proper scheduler tags. With the new head node deployment of OpenHPC we'll introduce modules to control the environment for newly installed software.

The default developer environment is setup at login is but a few modules.

[hmeij@cottontail2 ~]$ module list

Currently Loaded Modules:
  1) autotools   3) gnu9/9.4.0    5) ucx/1.11.2         7) openmpi4/4.1.1
  2) prun/2.2    4) hwloc/2.5.0   6) libfabric/1.13.0   8) ohpc

[hmeij@cottontail2 ~]$ which gcc mpicc nvcc
/opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc
/opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/bin/mpicc
/usr/local/cuda/bin/nvcc # cuda 11.6

All modules available to you can be queried

[hmeij@cottontail2 ~]$ module avail

------------------- /opt/ohpc/pub/moduledeps/gnu9-openmpi4 -------------
   adios/1.13.1     netcdf-cxx/4.3.1        py3-scipy/1.5.1
   boost/1.76.0     netcdf-fortran/4.5.3    scalapack/2.1.0
   dimemas/5.4.2    netcdf/4.7.4            scalasca/2.5
   example2/1.0     omb/5.8                 scorep/6.0
   extrae/3.7.0     opencoarrays/2.9.2      sionlib/1.7.4
   fftw/3.3.8       petsc/3.16.1            slepc/3.16.0
   hypre/2.18.1     phdf5/1.10.8            superlu_dist/6.4.0
   imb/2019.6       pnetcdf/1.12.2          tau/2.29
   mfem/4.3         ptscotch/6.0.6          trilinos/13.2.0
   mumps/5.2.1      py3-mpi4py/3.0.3

------------------------- /opt/ohpc/pub/moduledeps/gnu9 ----------------
   R/4.1.2          mpich/3.4.2-ofi         plasma/2.8.0
   gsl/2.7          mpich/3.4.2-ucx  (D)    py3-numpy/1.19.5
   hdf5/1.10.8      mvapich2/2.3.6          scotch/6.0.6
   impi/2021.5.1    openblas/0.3.7          superlu/5.2.1
   likwid/5.0.1     openmpi4/4.1.1   (L)
   metis/5.1.0      pdtoolkit/3.25.1

--------------------------- /opt/ohpc/pub/modulefiles -------------------
   EasyBuild/4.5.0          hwloc/2.5.0      (L)    prun/2.2          (L)
   autotools         (L)    intel/2022.0.2          singularity/3.7.1
   charliecloud/0.15        libfabric/1.13.0 (L)    ucx/1.11.2        (L)
   cmake/3.21.3             ohpc             (L)    valgrind/3.18.1
   example1/1.0             os
   gnu9/9.4.0        (L)    papi/5.7.0

----------------------- /share/apps/CENTOS8/ohpc/modulefiles ------------
   amber/20    cuda/11.6    hello-mpi/1.0    hello/1.0    miniconda3/py39

  Where:
   D:  Default Module
   L:  Module is loaded

If the avail list is too long consider trying:

"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

In addition you can request and query Easybuild modules and tooichains

# "use" them
[hmeij@cottontail2 ~]$ module use /sanscratch/CENTOS8/easybuild/ohpc/modules/all
[hmeij@cottontail2 ~]$ module avail

-------------------- /sanscratch/CENTOS8/easybuild/ohpc/modules/all ----------------------
   Autoconf/2.69-GCCcore-10.2.0                  UnZip/6.0-GCCcore-10.2.0
   Automake/1.16.2-GCCcore-10.2.0                XZ/5.2.5-GCCcore-10.2.0
   Autotools/20200321-GCCcore-10.2.0             binutils/2.35-GCCcore-10.2.0
   Bison/3.5.3                                   binutils/2.35                       (D)
   Bison/3.7.1-GCCcore-10.2.0                    bzip2/1.0.8-GCCcore-10.2.0
   Bison/3.7.1                            (D)    cURL/7.72.0-GCCcore-10.2.0
   Boost.Python/1.74.0-GCC-10.2.0                expat/2.2.9-GCCcore-10.2.0
   Boost/1.74.0-GCC-10.2.0                       flex/2.6.4-GCCcore-10.2.0
   CMake/3.18.4-GCCcore-10.2.0                   flex/2.6.4                          (D)
   CUDA/11.1.1-GCC-10.2.0                        fosscuda/2020b
   CUDAcore/11.1.1                               gcccuda/2020b
   Check/0.15.2-GCCcore-10.2.0                   gettext/0.21
   DB/18.1.40-GCCcore-10.2.0                     gompic/2020b
   Eigen/3.3.8-GCCcore-10.2.0                    groff/1.22.4-GCCcore-10.2.0
   FFTW/3.3.8-gompic-2020b                       help2man/1.47.16-GCCcore-10.2.0
   GCC/10.2.0                                    hwloc/2.2.0-GCCcore-10.2.0
   GCCcore/10.2.0                                hypothesis/5.41.2-GCCcore-10.2.0
   GDRCopy/2.1-GCCcore-10.2.0-CUDA-11.1.1        libarchive/3.4.3-GCCcore-10.2.0
   GMP/6.2.0-GCCcore-10.2.0                      libevent/2.1.12-GCCcore-10.2.0
   M4/1.4.18-GCCcore-10.2.0                      libfabric/1.11.0-GCCcore-10.2.0
   M4/1.4.18                              (D)    libffi/3.3-GCCcore-10.2.0
   Mako/1.1.3-GCCcore-10.2.0                     libpciaccess/0.16-GCCcore-10.2.0
   OpenBLAS/0.3.12-GCC-10.2.0                    libreadline/8.0-GCCcore-10.2.0
   OpenMPI/4.0.5-gcccuda-2020b                   libtool/2.4.6-GCCcore-10.2.0
   PMIx/3.1.5-GCCcore-10.2.0                     libxml2/2.9.10-GCCcore-10.2.0
   Perl/5.32.0-GCCcore-10.2.0-minimal            makeinfo/6.7-GCCcore-10.2.0-minimal
   Perl/5.32.0-GCCcore-10.2.0             (D)    ncurses/6.2-GCCcore-10.2.0
   PyCUDA/2020.1-fosscuda-2020b                  ncurses/6.2                         (D)
   Python/2.7.18-GCCcore-10.2.0                  numactl/2.0.13-GCCcore-10.2.0
   Python/3.8.6-GCCcore-10.2.0            (D)    pkg-config/0.29.2-GCCcore-10.2.0
   SQLite/3.33.0-GCCcore-10.2.0                  pybind11/2.6.0-GCCcore-10.2.0
   ScaLAPACK/2.1.0-gompic-2020b                  xorg-macros/1.19.2-GCCcore-10.2.0
   SciPy-bundle/2020.11-fosscuda-2020b           zlib/1.2.11-GCCcore-10.2.0
   Tcl/8.6.10-GCCcore-10.2.0                     zlib/1.2.11                         (D)
   UCX/1.9.0-GCCcore-10.2.0-CUDA-11.1.1

Containers

There are two container flavors in OpenHPC v2.4 which can be loaded via module

   singularity/3.7.1
   charliecloud/0.15 
   

How to launch docker containers with charliecloud (NGC catalog)

Nvidia NGC Containers: We built libnvidia-container to make it easy to run CUDA applications inside containers

Slurm's commands salloc, srun and sbatch (version 21.08+) have the '–container' parameter.

Rocky8 Slurm Template

In this section is posted a Slurm submit job template that runs Amber20's pmemd.cuda and pmemd.MPI. Depending on the program invoked, certain parameters need to be changed.

For pmemd.cuda we only need one gpu, one cpu core and thread (-n 1 , -B 1:1:1), some memory (nothing larger than gpu memory is needed), specify the gpu model, load the module, and invoke the program. As is depicted below.

For pmemd.MPI we do not specify any gpu parameters, request 8 cpu cores (-n 8, -B 2:4:1), request or not memory, load the module and invoke program with a machinefile specified.

Both modes run on one node (-N 1)

  • /zfshomes/hmeij/slurm/run.rocky
#!/bin/bash
# [found at XStream]
# Slurm will IGNORE all lines after the FIRST BLANK LINE,
# even the ones containing #SBATCH.
# Always put your SBATCH parameters at the top of your batch script.
# Took me days to find ... really silly behavior -Henk
#
# GENERAL
#SBATCH --job-name="test"
#SBATCH --output=out   # or both in default file
#SBATCH --error=err    # slurm-$SLURM_JOBID.out
#SBATCH --mail-type=END
#SBATCH --mail-user=hmeij@wesleyan.edu
#
# NODE control
#SBATCH -N 1     # default, nodes
#
# CPU control
#SBATCH -n 1     # tasks=S*C*T
#SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
###SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
#SBATCH --cpus-per-gpu=1
#SBATCH --mem-per-gpu=7168 
#
# GPU control
###SBATCH --gres=gpu:geforce_gtx_1080_ti:1  # n78
#SBATCH --gres=gpu:quadro_rtx_5000:1  # n[100-101]
#
# Node control
###SBATCH --partition=mw128
###SBATCH --nodelist=n74


# unique job scratch dirs
MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID
MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
export MYSANSCRATCH MYLOCALSCRATCH
cd $MYLOCALSCRATCH

### AMBER20
#source /share/apps/CENTOS8/ohpc/software/amber/20/amber.sh
# OR #
module load amber/20
# check
which nvcc gcc mpicc pmemd.cuda


# stage the data
cp -r ~/sharptail/* .

# set if needed. try stacking on same gpu, max=4
###export CUDA_VISIBLE_DEVICES=0
###export CUDA_VISIBLE_DEVICES=`gpu-free | sed 's/,/\n/g' | shuf | head -1`


# for amber gpu, select gpu model
mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \
-np  1 \
pmemd.cuda \
-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd

# for amber cpu, select partition
#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \
#-np  8 \
#pmemd.MPI \
#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd

scp mdout.$SLURM_JOB_ID ~/tmp/

Ouput


          -------------------------------------------------------
          Amber 20 PMEMD                              2020
          -------------------------------------------------------

| PMEMD implementation of SANDER, Release 18

|  Compiled date/time: Wed Apr  6 09:56:06 2022
| Run on 06/30/2022 at 10:09:57

|   Executable path: pmemd.cuda
| Working directory: /home/localscratch/1000608
|          Hostname: n101

...snip...

|--------------------- INFORMATION ----------------------
| GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE.
|                    Version 18.0.0

...snip...


|         ns/day =      11.75   seconds/ns =    7355.31
|  Total wall time:         589    seconds     0.16 hours

CentOS7 Slurm Template

In this job template I have it setup to run pmemd.MPI but could also invoke pmemd.cuda with proper parameter settings. I could also toggle between amber16 or amber20 which on queues mwgpu and exx96 are local disk CentOS7 software installations. Amber16 will not run on Rocky8 (tried it but forgot error message…we can expect problems like this, hence testing!).

Note also that we're running mwgpu's K20 cuda version 9.2 on exx96 queue (default cuda version 10.2). Not proper but it works. Hence this script will run on both queues. Oh, now I remember, it is that amber16 was compiled with cuda 9.2 drivers which are supported in cuda 10+_ but not in cuda 11+. So Amber 16, if needed, would need to be compiled in Rocky8 environment. (that may work like amber20).

  • /zfshomes/hmeij/slurm/run.centos
#!/bin/bash
# [found at XStream]
# Slurm will IGNORE all lines after the FIRST BLANK LINE,
# even the ones containing #SBATCH.
# Always put your SBATCH parameters at the top of your batch script.
# Took me days to find ... really silly behavior -Henk
#
# GENERAL
#SBATCH --job-name="test"
#SBATCH --output=out   # or both in default file
#SBATCH --error=err    # slurm-$SLURM_JOBID.out
##SBATCH --mail-type=END
##SBATCH --mail-user=hmeij@wesleyan.edu
#
# NODE control
#SBATCH -N 1     # default, nodes
#
# CPU control
#SBATCH -n 8     # tasks=S*C*T
###SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core
#SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core
#
# GPU control
###SBATCH --gres=gpu:tesla_k20m:1  # n[33-37]
###SBATCH --gres=gpu:geforce_rtx_2080_s:1  # n[79-90]
###SBATCH --cpus-per-gpu=1
###SBATCH --mem-per-gpu=7168
#
# Node control
#SBATCH --partition=exx96
#SBATCH --nodelist=n88


# unique job scratch dirs
MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID
MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID
export MYSANSCRATCH MYLOCALSCRATCH
cd $MYLOCALSCRATCH

# amber20/cuda 9.2/openmpi good for n33-n37 and n79-n90
export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/n37-cuda-9.2
export PATH=/usr/local/n37-cuda-9.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="/usr/local/n37-cuda-9.2/lib:${LD_LIBRARY_PATH}"
export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH
which nvcc mpirun python


###source /usr/local/amber16/amber.sh
source /usr/local/amber20/amber.sh
# stage the data
cp -r ~/sharptail/* .

# not quite proper, may cause problems, look at run.rocky
# if it needs to be set (not needed with slurm)
###export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1`
###export CUDA_VISIBLE_DEVICES=0


# for amber gpu, select gpu model
#mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \
#-np  1 \
#pmemd.cuda \
#-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd

# for amber cpu, select partition
mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \
-np  8 \
pmemd.MPI \
-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd

scp mdout.$SLURM_JOB_ID ~/tmp/

Output, takes forever to run…

[hmeij@cottontail2 slurm]$ ssh n88 top -u hmeij -b -n 1
top - 10:45:50 up 274 days,  1:40,  0 users,  load average: 2.69, 2.65, 2.67
Tasks: 516 total,   4 running, 512 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.1 us,  2.5 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 97336152 total,   368204 free,  3372968 used, 93594976 buff/cache
KiB Swap: 10485756 total, 10474748 free,    11008 used. 91700584 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
416795 hmeij     20   0  113196   1572   1300 S   0.0  0.0   0:00.00 slurm_scr+
416798 hmeij     20   0  286668   6116   3452 S   0.0  0.0   0:00.06 mpirun
416810 hmeij     20   0  149548   4008   3048 S   0.0  0.0   0:09.06 pmemd.MPI
416811 hmeij     20   0  149548   4016   3048 S   0.0  0.0   0:08.92 pmemd.MPI
416812 hmeij     20   0  149548   4020   3048 S   0.0  0.0   0:08.92 pmemd.MPI
416813 hmeij     20   0  149548   4012   3048 S   0.0  0.0   0:08.94 pmemd.MPI
416814 hmeij     20   0  149548   4016   3052 S   0.0  0.0   0:08.83 pmemd.MPI
416815 hmeij     20   0  149548   4008   3048 S   0.0  0.0   0:08.96 pmemd.MPI
416816 hmeij     20   0  149548   4024   3052 S   0.0  0.0   0:08.91 pmemd.MPI
416817 hmeij     20   0  149548   4012   3048 S   0.0  0.0   0:08.91 pmemd.MPI
417748 hmeij     20   0  166964   2412   1092 S   0.0  0.0   0:00.00 sshd
417749 hmeij     20   0  162292   2512   1524 R   0.0  0.0   0:00.03 top

Testing!

July 2022 is for testing… lots to learn!

Kudos to Abhilash for working our way through all this.


Back

cluster/218.1656610948.txt.gz · Last modified: 2022/06/30 13:42 by hmeij07