This is an old revision of the document!
Introducing our new login node cottontail2
. It is a server designed to run the Slurm scheduler and will sport the OpenHPC v2.4 software stack (External Link). We are deploying the Slurm/Warewulf recipe. You can find details at External Link: Rocky 8.5 with Architecture = (x86_64).
The original design was described at new primary login node page, but all that was pre-pandemic. The major deviance is we could not obtain 10G ethernet switches so going with 1G for now.
cottontail2.wesleyan.edu
runs the Rocky 8.5 operating system and has two fast Intel Xeon 5222 “Cascade Lake-SP” 3.8 GHz 4-core 14nm CPUs. In addition it has 96GB DDR4 2933 MHz ECC/Registered Memory.
On cottontail2
you can submit Slurm jobs to the test queue. From this server you can SSH to cottontail
much like greentail52
(do not add @wesleyan.edu). You may also continue to login to greentail52.wesleyan.edu
. Both these server will be around awhile.
The hope is that most of our compute nodes will be converted to Rocky 8.5 and added to the Slurm queues. Probably not hp12 nodes (too old) nor mwgpu nodes (K20 gpu model not supported anymore).
These nodes each have:
The nodes are defined at the bottom of these files
cottontail2:/etc/slurm/slurm.conf
cottontail2:/etc/slurm/gres.conf
[hmeij@cottontail2 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test* up 1-00:00:00 2 idle n[100-101] [hmeij@cottontail2 ~]$ sinfo -lN Thu Mar 24 14:18:45 2022 NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON n100 1 test* idle 48 2:12:2 192071 0 100 hasLocal none n101 1 test* idle 48 2:12:2 192071 0 100 hasLocal none [hmeij@cottontail2 ~]$ scontrol show node n100 NodeName=n100 Arch=x86_64 CoresPerSocket=12 CPUAlloc=0 CPUTot=48 CPULoad=0.00 ActiveFeatures=hasLocalscratch1tb,hasMem192gb ActiveFeatures=hasLocalscratch1tb,hasMem192gb Gres=gpu:quadro_rtx_5000:4 NodeAddr=n100 NodeHostName=n100 Version=20.11.8 OS=Linux 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Wed Jan 19 17:53:40 UTC 2022 RealMemory=192071 AllocMem=0 FreeMem=190797 Sockets=2 Boards=1 MemSpecLimit=1024 State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=100 Owner=N/A MCS_label=N/A Partitions=test BootTime=2022-03-23T15:59:28 SlurmdStartTime=2022-03-23T15:59:53 CfgTRES=cpu=48,mem=192071M,billing=48 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Comment=(null)
Note: in the output above S:C:T stands for Sockets:Cores:Threads, you can reserve resources based on these values. Features= allow you to filter the nodes that can run your job by requesting a certain feature, for example run on nodes with 192 GB memory (hasMem192gb). Gres= (Generic RESource) defines resources available, for example quadro_rtx_5000 gpus. You can request the detailed resource or just rtx_5000 or just quadro.
You can find more information on the Slurm Test Env page. Would be a good read.
Read the Slurm Test Env page, it will be helpful. What's presented in this section is a brief introduction on how to run a job on the RTX5000 gpus of n[100-101] compute nodes.
We will submit a job performing a gpu burn operation and cuda memory tests. These typically run overnight so we'll terminate after 15 minutes. Here is submit script.
#!/bin/bash # [found at XStream] # Slurm will IGNORE all lines after the FIRST BLANK LINE, # even the ones containing #SBATCH. # Always put your SBATCH parameters at the top of your batch script. # Took me days to find ... really silly behavior -Henk # # GENERAL #SBATCH --job-name="test" #SBATCH --output=out # or both in default file #SBATCH --error=err # slurm-$SLURM_JOBID.out #SBATCH --mail-type=END #SBATCH --mail-user=hmeij@wesleyan.edu #SBATCH --time=00:15:00 # # NODE control #SBATCH -N 1 # default, nodes # # CPU control #SBATCH -n 48 # tasks=S*C*T #SBATCH -B 2:12:2 # S:C:T=sockets/node:cores/socket:threads/core # # GPU control #SBATCH --gres=gpu:quadro_rtx_5000:4 # n[100-101] ./detect-gpus-then-run-gpuburn-and-cuda-memtest.sh
And the submit process.
[hmeij@cottontail2 microway]$ sbatch run.slurm Submitted batch job 1000004 [hmeij@cottontail2 microway]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1000004 test test hmeij R 0:08 1 n100 [hmeij@cottontail2 ~]$ ssh n100 gpu-info OpenHPC id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem 0, Quadro RTX 5000, 47, 13656 MiB, 2469 MiB, 100 %, 43 % 1, Quadro RTX 5000, 49, 13656 MiB, 2469 MiB, 100 %, 37 % 2, Quadro RTX 5000, 49, 13656 MiB, 2469 MiB, 100 %, 29 % 3, Quadro RTX 5000, 48, 13656 MiB, 2469 MiB, 100 %, 76 % [hmeij@cottontail2 ~]$ ssh n100 gpu-process OpenHPC gpu_name, gpu_id, pid, process_name Quadro RTX 5000, 0, 18714, ./gpu_burn Quadro RTX 5000, 1, 18743, ./gpu_burn Quadro RTX 5000, 2, 18744, ./gpu_burn Quadro RTX 5000, 3, 18745, ./gpu_burn
And the results in standard output and error files
#out 4 GPUs detected: GPU 0,CUDA device 0: Quadro RTX 5000 was detected as a quadro card. GPU burn will be run in single precision. 16384 MiB detected, will run gpu_burn with a matrix size of 16384. ...snip... Running gpu_burn 50000 for all GPUs Quadro RTX 5000 Quadro RTX 5000 Quadro RTX 5000 Quadro RTX 5000 3.3% proc: 0/0/11/0 err: 0/0/0/0 tmp: 51C/53C/53C/53C 6.3% proc: 0/11/11/0 err: 0/0/0/0 tmp: 51C/53C/53C/53C ...snip... #err slurmstepd: error: *** JOB 1000004 ON n100 CANCELLED AT 2022-03-28T13:20:12 DUE TO TIME LIMIT ***
OpenHPC provides recent versions of the GNU autotools collection, the Valgrind memory debugger, EasyBuild, and Spack (skipped). For more information on EasyBuild read EasyBuild page.
Requests for software and toolchains installations can be made, consult
https://docs.easybuild.io/en/latest/version-specific/Supported_software.html
OpenHPC presently packages the GNU compiler toolchain integrated with the underlying Lmod modules system in a hierarchical fashion. The modules system will conditionally present compiler-dependent software based on the toolchain currently loaded.
For MPI development and runtime support, OpenHPC provides pre-packaged builds for a variety of MPI families and transport layers. OpenHPC 2.x introduces the use of two related transport layers for the MPICH and OpenMPI builds that support a variety of underlying fabrics: UCX (Unified Communication X) and OFI (OpenFabrics interfaces). Both versions support Ethernet, Infiniband and Omni-Path. We do no use the latter two fabrics (although we do have some Infiniband but do not custom compile for it).
A default development environment for compilations for parallel programs requiring MPI. This setup can be conveniently enabled via modules and the OpenHPC modules environment is pre-configured to load an ohpc module on login (if present, [it is]). Our default environment enables autotools, the GNU compiler toolchain, and the OpenMPI stack.
OpenHPC provides pre-packaged builds for a number of popular open-source tools and libraries, for example FFTW and HDF5 (including serial and parallel I/O support), and the GNU Scientific Library (GSL).
# libraries/tools meta-packages built with GNU toolchain
# parallel lib meta-packages for all available MPI toolchains
OpenHPC also provides compatible builds for use with the compilers and MPI stack included in newer versions of the Intel® OneAPI HPC Toolkit (using the classic compiler variants).
# libs and tools
So what does all this look like? On the compute nodes /opt/intel
and /opt/ohpc/pub
are mounted from cottontail2
. The user environment is managed with package Lmod/module ( http://lmod.readthedocs.org). This eliminates the need to control your environment with PATH and LD_LIBRARY_PATH exports. (But you will still have to do so when using software compiled in cottontail/greentail52 environments (CentOS 6+7).
For the new environment I will probably compile software using
the OpenHPC software stack and stage the modules in
/share/apps/CENTOS8/ohpc/
duplicating the /opt/ohpc/pub
setup. We'll have to experiment a bit.
So after login, the default environment shows.
Please note that in this OpenHPC default environment
/share/apps/intel/parallel_studio_xe_2016_update3
has been removed from $PATH. Probably about time.
# default environment [hmeij@cottontail2 ~]$ module list Currently Loaded Modules: 1) autotools 3) gnu9/9.4.0 5) ucx/1.11.2 7) openmpi4/4.1.1 2) prun/2.2 4) hwloc/2.5.0 6) libfabric/1.13.0 8) ohpc [hmeij@cottontail2 ~]$ which gcc mpicc /opt/ohpc/pub/compiler/gcc/9.4.0/bin/gcc /opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/bin/mpicc # set at login [hmeij@cottontail2 ~]$ env | grep -i modulepath MODULEPATH=/opt/ohpc/pub/moduledeps/gnu9-openmpi4:/opt/ohpc/pub/moduledeps/gnu9:/opt/ohpc/pub/modulefiles # more is available # a serial gnu9 and a parallel gnu9-openmpi4 toolchains and a tool/compiler chain. hmeij@cottontail2 ~]$ module avail -------------------- /opt/ohpc/pub/moduledeps/gnu9-openmpi4 -------------------- adios/1.13.1 netcdf-cxx/4.3.1 py3-scipy/1.5.1 boost/1.76.0 netcdf-fortran/4.5.3 scalapack/2.1.0 dimemas/5.4.2 netcdf/4.7.4 scalasca/2.5 example2/1.0 omb/5.8 scorep/6.0 extrae/3.7.0 opencoarrays/2.9.2 sionlib/1.7.4 fftw/3.3.8 petsc/3.16.1 slepc/3.16.0 hypre/2.18.1 phdf5/1.10.8 superlu_dist/6.4.0 imb/2019.6 pnetcdf/1.12.2 tau/2.29 mfem/4.3 ptscotch/6.0.6 trilinos/13.2.0 mumps/5.2.1 py3-mpi4py/3.0.3 ------------------------ /opt/ohpc/pub/moduledeps/gnu9 ------------------------- R/4.1.2 mpich/3.4.2-ofi plasma/2.8.0 gsl/2.7 mpich/3.4.2-ucx (D) py3-numpy/1.19.5 hdf5/1.10.8 mvapich2/2.3.6 scotch/6.0.6 impi/2021.5.1 openblas/0.3.7 superlu/5.2.1 likwid/5.0.1 openmpi4/4.1.1 (L) metis/5.1.0 pdtoolkit/3.25.1 -------------------------- /opt/ohpc/pub/modulefiles --------------------------- EasyBuild/4.5.0 hwloc/2.5.0 (L) prun/2.2 (L) autotools (L) intel/2022.0.2 singularity/3.7.1 charliecloud/0.15 libfabric/1.13.0 (L) ucx/1.11.2 (L) cmake/3.21.3 ohpc (L) valgrind/3.18.1 example1/1.0 (L) os gnu9/9.4.0 (L) papi/5.7.0 Where: D: Default Module L: Module is loaded
Switching to Intel's OneAPI toolchain requires swapping to Intel's gnu9 compiler.
[hmeij@cottontail2 ~]$ module load intel/2022.0.2 Loading compiler version 2022.0.2 Loading tbb version 2021.5.1 Loading compiler-rt version 2022.0.2 Loading oclfpga version 2022.0.2 Load "debugger" to debug DPC++ applications with the gdb-oneapi debugger. Load "dpl" for additional DPC++ APIs: https://github.com/oneapi-src/oneDPL Loading mkl version 2022.0.2 Lmod has detected the following error: You can only have one compiler module loaded at a time. You already have gnu9 loaded. To correct the situation, please execute the following command: $ module swap gnu9 intel/2022.0.2 # after the swap we observe [hmeij@cottontail2 ~]$ which icc icx mpicc /opt/intel/oneapi/compiler/2022.0.2/linux/bin/intel64/icc /opt/intel/oneapi/compiler/2022.0.2/linux/bin/icx /opt/ohpc/pub/mpi/openmpi4-intel/4.1.1/bin/mpicc # debugger module [hmeij@cottontail2 ~]$ module load debugger Loading debugger version 2021.5.0
Disabled the following, it sets up ~/.ssh/config
file that conflicts with old HPC head node.
#/etc/profile.d/cluster-env.[sh|csh] ##if [ -x "/usr/bin/cluster-env" ]; then ## /usr/bin/cluster-env ##fi
Figure how to launch docker containers with charliecloud (NGC catalog)
NGC Containers: We built libnvidia-container to make it easy to run CUDA applications inside containers
salloc, srun and sbatch (in Slurm 21.08+) have the '–container' argument … greentail52's test slurm version is 21.08.1, cottontail2 runs slurm version 20.11.8 - so test on greentail52 first.
Figure out an upgrade process before going production (don't forget any chroot images and rebuild images).
yum upgrade "*-ohpc" yum upgrade "ohpc-base"
Independent modules can be inserted in the OpenHPC environment. But I will try to keep them separate from the beginning so ass to not accidentally customize the environment. Two indeoendent application examples explained below.
# an application with no compiler or MPI runtime dependencies mkdir /opt/ohpc/pub/modulefiles/example1 cp /opt/ohpc/pub/examples/example.modulefile \ /opt/ohpc/pub/modulefiles/example1/1.0 # an application dependent on OpenMPI and the GNU toolchain mkdir /opt/ohpc/pub/moduledeps/gnu9-openmpi4/example2 cp /opt/ohpc/pub/examples/example-mpi-dependent.modulefile \ /opt/ohpc/pub/moduledeps/gnu9-openmpi4/example2/1.0 # why would you put these in pub/libs ??? [hmeij@cottontail2 ~]$ module show example1/1.0 ------------------------------------------------------------------------------------ /opt/ohpc/pub/modulefiles/example1/1.0: ------------------------------------------------------------------------------------ whatis("Name: example ") whatis("Version: 1.0 ") whatis("Category: runtime library ") whatis("Description: example independant module ") whatis("URL http://www.google.com/ ") prepend_path("PATH","/opt/ohpc/pub/libs/example/1.0/bin") prepend_path("MANPATH","/opt/ohpc/pub/libs/example/1.0/man") prepend_path("INCLUDE","/opt/ohpc/pub/libs/example/1.0/include") prepend_path("LD_LIBRARY_PATH","/opt/ohpc/pub/libs/example/1.0/lib") setenv("EXAMPLE_DIR","/opt/ohpc/pub/libs/example/1.0") setenv("EXAMPLE_BIN","/opt/ohpc/pub/libs/example/1.0/bin") setenv("EXAMPLE_LIB","/opt/ohpc/pub/libs/example/1.0/lib") setenv("EXAMPLE_INC","/opt/ohpc/pub/libs/example/1.0/include") help([[ This module loads the example program toolchain. Version 1.0 ]])
Sample job to run Amber20 on n[100-101]
Amber cmake download fails with READLINE error … package readline-devel needs to be installed to get past that which pulls in ncurses-c++-libs-6.1-9.20180224.el8.x86_64 ncurses-devel-6.1-9.20180224.el8.x86_64 readline-devel-7.0-10.el8.x86_64
Example script run.rocky for cpu or gpu run (not for queues mwgpu, exx96)
#!/bin/bash # [found at XStream] # Slurm will IGNORE all lines after the FIRST BLANK LINE, # even the ones containing #SBATCH. # Always put your SBATCH parameters at the top of your batch script. # Took me days to find ... really silly behavior -Henk # # GENERAL #SBATCH --job-name="test" #SBATCH --output=out # or both in default file #SBATCH --error=err # slurm-$SLURM_JOBID.out #SBATCH --mail-type=END #SBATCH --mail-user=hmeij@wesleyan.edu # # NODE control #SBATCH -N 1 # default, nodes # # CPU control #SBATCH -n 8 # tasks=S*C*T ###SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core #SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core ###SBATCH --cpus-per-gpu=1 ###SBATCH --mem-per-gpu=7168 # # GPU control ###SBATCH --gres=gpu:geforce_gtx_1080_ti:1 # n78 ###SBATCH --gres=gpu:quadro_rtx_5000:1 # n[100-101] # # Node control #SBATCH --partition=tinymem #SBATCH --nodelist=n57 # unique job scratch dirs MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID export MYSANSCRATCH MYLOCALSCRATCH cd $MYLOCALSCRATCH ### AMBER20 #source /share/apps/CENTOS8/ohpc/software/amber/20/amber.sh # OR # module load amber/20 # check which nvcc gcc mpicc pmemd.cuda # stage the data cp -r ~/sharptail/* . export CUDA_VISIBLE_DEVICES=0 # for amber20 on n[100-101] gpus, select gpu model #mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \ #-np 1 \ #pmemd.cuda \ #-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd # for amber20 on n59/n77 cpus, select partition mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \ -np 8 \ pmemd.MPI \ -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd scp mdout.$SLURM_JOB_ID ~/tmp/
Example script run.centos for cpui or gpu run (queues mwgpu, exx96)
#!/bin/bash # [found at XStream] # Slurm will IGNORE all lines after the FIRST BLANK LINE, # even the ones containing #SBATCH. # Always put your SBATCH parameters at the top of your batch script. # Took me days to find ... really silly behavior -Henk # # GENERAL #SBATCH --job-name="test" #SBATCH --output=out # or both in default file #SBATCH --error=err # slurm-$SLURM_JOBID.out ##SBATCH --mail-type=END ##SBATCH --mail-user=hmeij@wesleyan.edu # # NODE control #SBATCH -N 1 # default, nodes # # CPU control #SBATCH -n 1 # tasks=S*C*T #SBATCH -B 1:1:1 # S:C:T=sockets/node:cores/socket:threads/core ###SBATCH -B 2:4:1 # S:C:T=sockets/node:cores/socket:threads/core # # GPU control ###SBATCH --gres=gpu:tesla_k20m:1 # n[33-37] #SBATCH --gres=gpu:geforce_rtx_2080_s:1 # n[79-90] #SBATCH --cpus-per-gpu=1 #SBATCH --mem-per-gpu=7168 # # Node control #SBATCH --partition=exx96 #SBATCH --nodelist=n88 # unique job scratch dirs MYSANSCRATCH=/sanscratch/$SLURM_JOB_ID MYLOCALSCRATCH=/localscratch/$SLURM_JOB_ID export MYSANSCRATCH MYLOCALSCRATCH cd $MYLOCALSCRATCH # amber20/cuda 9.2/openmpi good for n33-n37 and n79-n90 export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/n37-cuda-9.2 export PATH=/usr/local/n37-cuda-9.2/bin:$PATH export LD_LIBRARY_PATH=/usr/local/n37-cuda-9.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH="/usr/local/n37-cuda-9.2/lib:${LD_LIBRARY_PATH}" export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH which nvcc mpirun python source /usr/local/amber20/amber.sh # stage the data cp -r ~/sharptail/* . ###export CUDA_VISIBLE_DEVICES=`shuf -i 0-3 -n 1` ###export CUDA_VISIBLE_DEVICES=0 # for amber20 on n[33-37] gpus, select gpu model mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts-one.txt \ -np 1 \ pmemd.cuda \ -O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd # for amber20 on n59/n100 cpus, select partition #mpirun -x LD_LIBRARY_PATH -machinefile ~/slurm/localhosts.txt \ #-np 8 \ #pmemd.MPI \ #-O -o mdout.$SLURM_JOB_ID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd scp mdout.$SLURM_JOB_ID ~/tmp/
The script amber.sh was converted to a module like so
# or do this and add content of foo_1.0 to this module #$LMOD_DIR/sh_to_modulefile --to TCL --from=bash \ #--output /tmp/foo_1.0 \ #/share/apps/CENTOS8/ohpc/software/amber/20/amber.sh # need Lmod 8.6+, ohpc has 8.5.1 #switch -- [module-info shelltype] { # sh { # source-sh bash $scriptpath/amber.sh # } # csh { # source-sh tcsh $scriptpath/amber.csh # } #} # which generated these lines with the Tcl header, then add these to the modulefile for amber/20 setenv AMBERHOME {/share/apps/CENTOS8/ohpc/software/amber/20} setenv LD_LIBRARY_PATH {/share/apps/CENTOS8/ohpc/software/amber/20/lib} prepend-path PATH {/share/apps/CENTOS8/ohpc/software/amber/20/bin} setenv PERL5LIB {/share/apps/CENTOS8/ohpc/software/amber/20/lib/perl} setenv PYTHONPATH {/share/apps/CENTOS8/ohpc/software/amber/20/lib/python3.9/site-packages}
First establish a successfull run with the run.rocky
script for Amber20 (listed above. Then change the module in your script.
module load amber/22 # if the module does not show up in the output of module avail # treat your cache as out of date module --ignore_cache avail
Amber22 is somehow incompatible with CentOS/Rocky openmpi (yum install). Hence the latest version of openmpi was compiled and installed into $AMBERHOME. No need to modify $PATH/$LD_LIBRAY_PATH after you load the module or source amber.sh
.