greentail52
) via VPN/usr/local/slurm/bin/sinfo
[hmeij@cottontail2 ~]$ which sinfo /usr/local/slurm/bin/sinfo [hmeij@cottontail2 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST mwgpu up infinite 2 idle n[34-35] tinymem up infinite 5 idle n[55-59] mw128 up infinite 5 idle n[73-77] amber128 up infinite 1 idle n78 exx96 up infinite 5 idle n[86-90] test* up infinite 2 idle n[100-101]
hp12
and mwgpu
(centos6) will be serviced by Openlava, not SlurmJump to the Rocky8/CentOs7 script templates listed in the menu of this page, top right.
There is also detailed information on Amber20/Amber22 on this page with script examples.
# sorta like bqueues sinfo -l # more node info sinfo -lN # sorta like bsub sbatch run.sh # sorta like bjobs squeue # sorta like bhosts -l scontrol show node n78 # sorta like bstop/bresume scontrol suspend job 1000001 scontrol resume job 1000001 # sorta like bhist -l scontrol show job 1000002 # sorta like bkill scancel 1000003
man slurm.conf
man sbatch
You must request resources, that is for example number of cpu cores or which gpu model to use. If you do not request resources, Slurm will assume you need all the node's resources and thus prevent other jobs from running on that node.
Details
Some common examples are:
NODE control #SBATCH -N 1 # default, nr of nodes CPU control #SBATCH -n 8 # tasks=S*C*T #SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core #SBATCH --mem=250 # needed to override oversubscribe #SBATCH --ntasks-per-node=1 # perhaps needed to override oversubscribe #SBATCH --cpus-per-task=1 # needed to override oversubscribe GPU control #SBATCH --cpus-per-gpu=1 # needed to override oversubscribe #SBATCH --mem-per-gpu=7168 # needed to override oversubscribe #SBATCH --gres=gpu:geforce_gtx_1080_ti:1 # n[78], amber128 #SBATCH --gres=gpu:geforce_rtx_2080_s:1 # n[79-90], exx96 #SBATCH --gres=gpu:quadro_rtx_5000:1 # n[100-101], test #SBATCH --gres=gpu:tesla_k20m:1 # n[33-37], mwgpu Partition control #SBATCH --partition=mw128 #SBATCH --nodelist=n74 Globbing queues, based on Priority/Weight (view output of 'sinfo -lN' srun --partition=exx96,amber128,mwgpu --mem=1024 --gpus=1 sleep 60 &
Pending Jobs
I keep having to inform users that with -n 1 and -cpu 1 your job can still go in pending state because user forgot to reserve memory … so silly slurm assumes your job needs all the node's memory. Here is my template then
FirstName, your jobs are pending because you did not request memory and if not then slurm assumes you need all memory, silly. Command "scontrol show job JOBID" will reveal ... JobId=1062052 JobName=3a_avgHbond_CPU NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:1:1 TRES=cpu=1,mem=191047M,node=1,billing=1 <--------- I looked (command "ssh n?? top -u username -b -n 1", look for the VIRT value) and you need less than 1G per job so with --mem=1024 and n=1 and cpu=1 you should be able to load 48 jobs onto n100. Consult output of command "sinfo -lN"
Slurm has a builtin MPI flavor. I suggest you do not rely on it. The documentation states that on major release upgrades the libslurm.so
library is not backwards compatible. All software using this library would need to be recompiled.
There is a handy parallel job launcher which may be of use, it is called srun
. srun
commands can be embedded in a job submission script but it can also be used interactively to test commands out. The submmited job will have a single JOBPID and launch multiple tasks.
$ srun --partition=mwgpu -n 4 -B 1:4:1 --mem=1024 sleep 60 & $ squeue
For more details on srun
consult https://slurm.schedmd.com/srun.html
For MPI development and runtime support, OpenHPC provides pre-packaged builds for a variety of MPI families and transport layers. OpenHPC 2.x introduces the use of two related transport layers for the MPICH and OpenMPI builds that support a variety of underlying fabrics: UCX (Unified Communication X) and OFI (OpenFabrics interfaces). Both versions support Ethernet, Infiniband and Omni-Path. We do no use the latter two fabrics (although we do have some Infiniband switches but do not custom compile for it).
Contents of these packages can be found at /zfshomes/hmeij/openhpc
OpenHPC also provides compatible builds for use with the compilers and MPI stack included in newer versions of the Intel® OneAPI HPC Toolkit (using the classic compiler variants).
Contents of these packages can be found at /zfshomes/hmeij/openhpc
Consult the section below.
Switching to Intel's OneAPI toolchain requires swapping to Intel's gnu9 compiler.
[hmeij@cottontail2 ~]$ module load intel/2022.0.2 Loading compiler version 2022.0.2 Loading tbb version 2021.5.1 Loading compiler-rt version 2022.0.2 Loading oclfpga version 2022.0.2 Load "debugger" to debug DPC++ applications with the gdb-oneapi debugger. Load "dpl" for additional DPC++ APIs: https://github.com/oneapi-src/oneDPL Loading mkl version 2022.0.2 Lmod has detected the following error: You can only have one compiler module loaded at a time. You already have gnu9 loaded. To correct the situation, please execute the following command: $ module swap gnu9 intel/2022.0.2 # after the swap we observe [hmeij@cottontail2 ~]$ which icc icx mpicc ifort ifx /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/icc /opt/intel/oneapi/compiler/2022.0.2/linux/bin/icx /opt/ohpc/pub/mpi/openmpi4-intel/4.1.1/bin/mpicc /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort /opt/intel/oneapi/compiler/2022.0.2/linux/bin/ifx # more on OneAPI and its variety of compilers https://dokuwiki.wesleyan.edu/doku.php?id=cluster:203 # debugger module [hmeij@cottontail2 ~]$ module load debugger Loading debugger version 2021.5.0
You can continue to rely on PATH/LD_LIBRARY_PATH settings to control the environment. This also implies your job should run under Openlava or Slurm with proper scheduler tags. With the new head node deployment of OpenHPC we'll introduce modules
to control the environment for newly installed software.
The default developer environment is setup at login is but a few modules.
[hmeij@cottontail2 ~]$ module list Currently Loaded Modules: 1) autotools 3) gnu9/9.4.0 5) ucx/1.11.2 7) openmpi4/4.1.1 2) prun/2.2 4) hwloc/2.5.0 6) libfabric/1.13.0 8) ohpc [hmeij@cottontail2 ~]$ which gcc mpicc nvcc /opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc /opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/bin/mpicc /usr/local/cuda/bin/nvcc # cuda 11.6
All modules available to you can be queried
[hmeij@cottontail2 ~]$ module avail ------------------- /opt/ohpc/pub/moduledeps/gnu9-openmpi4 ------------- adios/1.13.1 netcdf-cxx/4.3.1 py3-scipy/1.5.1 boost/1.76.0 netcdf-fortran/4.5.3 scalapack/2.1.0 dimemas/5.4.2 netcdf/4.7.4 scalasca/2.5 example2/1.0 omb/5.8 scorep/6.0 extrae/3.7.0 opencoarrays/2.9.2 sionlib/1.7.4 fftw/3.3.8 petsc/3.16.1 slepc/3.16.0 hypre/2.18.1 phdf5/1.10.8 superlu_dist/6.4.0 imb/2019.6 pnetcdf/1.12.2 tau/2.29 mfem/4.3 ptscotch/6.0.6 trilinos/13.2.0 mumps/5.2.1 py3-mpi4py/3.0.3 ------------------------- /opt/ohpc/pub/moduledeps/gnu9 ---------------- R/4.1.2 mpich/3.4.2-ofi plasma/2.8.0 gsl/2.7 mpich/3.4.2-ucx (D) py3-numpy/1.19.5 hdf5/1.10.8 mvapich2/2.3.6 scotch/6.0.6 impi/2021.5.1 openblas/0.3.7 superlu/5.2.1 likwid/5.0.1 openmpi4/4.1.1 (L) metis/5.1.0 pdtoolkit/3.25.1 --------------------------- /opt/ohpc/pub/modulefiles ------------------- EasyBuild/4.5.0 hwloc/2.5.0 (L) prun/2.2 (L) autotools (L) intel/2022.0.2 singularity/3.7.1 charliecloud/0.15 libfabric/1.13.0 (L) ucx/1.11.2 (L) cmake/3.21.3 ohpc (L) valgrind/3.18.1 example1/1.0 os gnu9/9.4.0 (L) papi/5.7.0 ----------------------- /share/apps/CENTOS8/ohpc/modulefiles ------------ amber/20 cuda/11.6 hello-mpi/1.0 hello/1.0 miniconda3/py39 Where: D: Default Module L: Module is loaded If the avail list is too long consider trying: "module --default avail" or "ml -d av" to just list the default modules. "module overview" or "ml ov" to display the number of modules for each name. Use "module spider" to find all possible modules and extensions. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
In addition you can request and query Easybuild modules and tooichains
# "use" them [hmeij@cottontail2 ~]$ module use /sanscratch/CENTOS8/easybuild/ohpc/modules/all [hmeij@cottontail2 ~]$ module avail -------------------- /sanscratch/CENTOS8/easybuild/ohpc/modules/all ---------------------- Autoconf/2.69-GCCcore-10.2.0 UnZip/6.0-GCCcore-10.2.0 Automake/1.16.2-GCCcore-10.2.0 XZ/5.2.5-GCCcore-10.2.0 Autotools/20200321-GCCcore-10.2.0 binutils/2.35-GCCcore-10.2.0 Bison/3.5.3 binutils/2.35 (D) Bison/3.7.1-GCCcore-10.2.0 bzip2/1.0.8-GCCcore-10.2.0 Bison/3.7.1 (D) cURL/7.72.0-GCCcore-10.2.0 Boost.Python/1.74.0-GCC-10.2.0 expat/2.2.9-GCCcore-10.2.0 Boost/1.74.0-GCC-10.2.0 flex/2.6.4-GCCcore-10.2.0 CMake/3.18.4-GCCcore-10.2.0 flex/2.6.4 (D) CUDA/11.1.1-GCC-10.2.0 fosscuda/2020b CUDAcore/11.1.1 gcccuda/2020b Check/0.15.2-GCCcore-10.2.0 gettext/0.21 DB/18.1.40-GCCcore-10.2.0 gompic/2020b Eigen/3.3.8-GCCcore-10.2.0 groff/1.22.4-GCCcore-10.2.0 FFTW/3.3.8-gompic-2020b help2man/1.47.16-GCCcore-10.2.0 GCC/10.2.0 hwloc/2.2.0-GCCcore-10.2.0 GCCcore/10.2.0 hypothesis/5.41.2-GCCcore-10.2.0 GDRCopy/2.1-GCCcore-10.2.0-CUDA-11.1.1 libarchive/3.4.3-GCCcore-10.2.0 GMP/6.2.0-GCCcore-10.2.0 libevent/2.1.12-GCCcore-10.2.0 M4/1.4.18-GCCcore-10.2.0 libfabric/1.11.0-GCCcore-10.2.0 M4/1.4.18 (D) libffi/3.3-GCCcore-10.2.0 Mako/1.1.3-GCCcore-10.2.0 libpciaccess/0.16-GCCcore-10.2.0 OpenBLAS/0.3.12-GCC-10.2.0 libreadline/8.0-GCCcore-10.2.0 OpenMPI/4.0.5-gcccuda-2020b libtool/2.4.6-GCCcore-10.2.0 PMIx/3.1.5-GCCcore-10.2.0 libxml2/2.9.10-GCCcore-10.2.0 Perl/5.32.0-GCCcore-10.2.0-minimal makeinfo/6.7-GCCcore-10.2.0-minimal Perl/5.32.0-GCCcore-10.2.0 (D) ncurses/6.2-GCCcore-10.2.0 PyCUDA/2020.1-fosscuda-2020b ncurses/6.2 (D) Python/2.7.18-GCCcore-10.2.0 numactl/2.0.13-GCCcore-10.2.0 Python/3.8.6-GCCcore-10.2.0 (D) pkg-config/0.29.2-GCCcore-10.2.0 SQLite/3.33.0-GCCcore-10.2.0 pybind11/2.6.0-GCCcore-10.2.0 ScaLAPACK/2.1.0-gompic-2020b xorg-macros/1.19.2-GCCcore-10.2.0 SciPy-bundle/2020.11-fosscuda-2020b zlib/1.2.11-GCCcore-10.2.0 Tcl/8.6.10-GCCcore-10.2.0 zlib/1.2.11 (D) UCX/1.9.0-GCCcore-10.2.0-CUDA-11.1.1
There are two container flavors in OpenHPC v2.4 which can be loaded via module
singularity/3.7.1 charliecloud/0.15
How to launch docker containers with charliecloud (NGC catalog)
Nvidia NGC Containers: We built libnvidia-container
to make it easy to run CUDA applications inside containers
Slurm's commands salloc
, srun
and sbatch
(version 21.08+) have the '–container' parameter.
In this section is posted a Slurm submit job template that runs Amber20's pmemd.cuda
and pmemd.MPI
. Depending on the program invoked, certain parameters need to be changed.
For pmemd.cuda
we only need one gpu, one cpu core and thread (-n 1 , -B 1:1:1), some memory (nothing larger than gpu memory is needed), specify the gpu model, load the module, and invoke the program. As is depicted below.
For pmemd.MPI
we do not specify any gpu parameters, request 8 cpu cores (-n 8, -B 2:4:1), request or not memory, load the module and invoke program with a machinefile specified.
Both modes run on one node (-N 1)
/zfshomes/hmeij/slurm/run.rocky
for tinymem, mw128, amber128, test queues#!/bin/bash # [found at XStream] # Slurm will IGNORE all lines after the FIRST BLANK LINE, # even the ones containing #SBATCH. # Always put your SBATCH parameters at the top of your batch script. # Took me days to find ... really silly behavior -Henk # # GENERAL #SBATCH --job-name="test" #SBATCH --output=out # or both in default file #SBATCH --error=err # slurm-$SLURM_JOBID.out #SBATCH --mail-type=END #SBATCH --mail-user=hmeij@wesleyan.edu # # NODE control #SBATCH -N 1 # default, nodes # # CPU control #SBATCH -n 1 # tasks=S*C*T #SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core ###SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core # # GPU control #SBATCH --cpus-per-gpu=1 #SBATCH --mem-per-gpu=7168 ###SBATCH --gres=gpu:geforce_gtx_1080_ti:1 # n78 #SBATCH --gres=gpu:quadro_rtx_5000:1 # n[100-101] # # Node control ###SBATCH --partition=mw128 ###SBATCH --nodelist=n74 # unique job scratch dirs MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID export MYSANSCRATCH MYLOCALSCRATCH cd $MYLOCALSCRATCH ### AMBER20 works via slurm's imaged nodes, test and amber128 queues #source /share/apps/CENTOS8/ohpc/software/amber/20/amber.sh # OR # module load amber/20 # check which nvcc gcc mpicc pmemd.cuda # stage the data cp -r ~/sharptail/* . # set if needed. try stacking on same gpu, max=4 ###export CUDA_VISIBLE_DEVICES=0 ###export CUDA_VISIBLE_DEVICES=`gpu-free | sed 's/,/\n/g' | shuf | head -1` # for amber gpu, select gpu model mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \ -np 1 \ pmemd.cuda \ -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd # for amber cpu, select partition #mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \ #-np 8 \ #pmemd.MPI \ #-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd scp mdout.$SLURM_JOB_ID ~/tmp/
Ouput
------------------------------------------------------- Amber 20 PMEMD 2020 ------------------------------------------------------- | PMEMD implementation of SANDER, Release 18 | Compiled date/time: Wed Apr 6 09:56:06 2022 | Run on 06/30/2022 at 10:09:57 | Executable path: pmemd.cuda | Working directory: /home/localscratch/1000608 | Hostname: n101 ...snip... |--------------------- INFORMATION ---------------------- | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE. | Version 18.0.0 ...snip... | ns/day = 11.75 seconds/ns = 7355.31 | Total wall time: 589 seconds 0.16 hours
In this job template I have it setup to run pmemd.MPI
but could also invoke pmemd.cuda
with proper parameter settings. On queues mwgpu
and exx96
amber[16,20] are local disk CentOS7 software installations. Amber16 will not run on Rocky8 (tried it but forgot error message…we can expect problems like this, hence testing!).
Note also that we're running mwgpu's K20 cuda version 9.2 on exx96 queue (default cuda version 10.2). Not proper but it works. Hence this script will run on both queues. Oh, now I remember, it is that amber16 was compiled with cuda 9.2 drivers which are supported in cuda 10.x but not in cuda 11.x. So Amber 16, if needed, would need to be compiled in Rocky8 environment (and may work like amber20 module).
/zfshomes/hmeij/slurm/run.centos
for mwgpu, exx96 queues#!/bin/bash # [found at XStream] # Slurm will IGNORE all lines after the FIRST BLANK LINE, # even the ones containing #SBATCH. # Always put your SBATCH parameters at the top of your batch script. # Took me days to find ... really silly behavior -Henk # # GENERAL #SBATCH --job-name="test" #SBATCH --output=out # or both in default file #SBATCH --error=err # slurm-$SLURM_JOBID.out ##SBATCH --mail-type=END ##SBATCH --mail-user=hmeij@wesleyan.edu # # NODE control #SBATCH -N 1 # default, nodes # # CPU control #SBATCH -n 8 # tasks=S*C*T ###SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core #SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core # # GPU control ###SBATCH --cpus-per-gpu=1 ###SBATCH --mem-per-gpu=7168 ###SBATCH --gres=gpu:tesla_k20m:1 # n[33-37] ###SBATCH --gres=gpu:geforce_rtx_2080_s:1 # n[79-90] # # Node control #SBATCH --partition=exx96 #SBATCH --nodelist=n88 # may or may not be needed, centos7 login env source $HOME/.bashrc which ifort # should be the parallel studio 2016 version # unique job scratch dirs MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID export MYSANSCRATCH MYLOCALSCRATCH cd $MYLOCALSCRATCH # amber20/cuda 9.2/openmpi good for n33-n37 and n79-n90 export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/n37-cuda-9.2 export PATH=/usr/local/n37-cuda-9.2/bin:$PATH export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH="/usr/local/n37-cuda-9.2/lib:${LD_LIBRARY_PATH}" export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH which nvcc mpirun python ###source /usr/local/amber16/amber.sh # works via slurm's mwgpu source /usr/local/amber20/amber.sh # works via slurm's exx96 # stage the data cp -r ~/sharptail/* . # not quite proper, may cause problems, look at run.rocky # if it needs to be set (not needed with slurm) ###export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1` ###export CUDA_VISIBLE_DEVICES=0 # for amber gpu, select gpu model #mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \ #-np 1 \ #pmemd.cuda \ #-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd # for amber cpu, select partition mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \ -np 8 \ pmemd.MPI \ -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd scp mdout.$SLURM_JOB_ID ~/tmp/
Output, takes forever to run…
[hmeij@cottontail2 slurm]$ ssh n88 top -u hmeij -b -n 1 top - 10:45:50 up 274 days, 1:40, 0 users, load average: 2.69, 2.65, 2.67 Tasks: 516 total, 4 running, 512 sleeping, 0 stopped, 0 zombie %Cpu(s): 4.1 us, 2.5 sy, 0.0 ni, 93.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 97336152 total, 368204 free, 3372968 used, 93594976 buff/cache KiB Swap: 10485756 total, 10474748 free, 11008 used. 91700584 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 416795 hmeij 20 0 113196 1572 1300 S 0.0 0.0 0:00.00 slurm_scr+ 416798 hmeij 20 0 286668 6116 3452 S 0.0 0.0 0:00.06 mpirun 416810 hmeij 20 0 149548 4008 3048 S 0.0 0.0 0:09.06 pmemd.MPI 416811 hmeij 20 0 149548 4016 3048 S 0.0 0.0 0:08.92 pmemd.MPI 416812 hmeij 20 0 149548 4020 3048 S 0.0 0.0 0:08.92 pmemd.MPI 416813 hmeij 20 0 149548 4012 3048 S 0.0 0.0 0:08.94 pmemd.MPI 416814 hmeij 20 0 149548 4016 3052 S 0.0 0.0 0:08.83 pmemd.MPI 416815 hmeij 20 0 149548 4008 3048 S 0.0 0.0 0:08.96 pmemd.MPI 416816 hmeij 20 0 149548 4024 3052 S 0.0 0.0 0:08.91 pmemd.MPI 416817 hmeij 20 0 149548 4012 3048 S 0.0 0.0 0:08.91 pmemd.MPI 417748 hmeij 20 0 166964 2412 1092 S 0.0 0.0 0:00.00 sshd 417749 hmeij 20 0 162292 2512 1524 R 0.0 0.0 0:00.03 top
July 2022 is for testing… lots to learn!
Kudos to Abhilash and Colin for working our way through all this.