Back

K20 Redo Usage

One node n37 has been redone with latest Nvidia CUDA drives during summer 2018. Please test it out before we decide to redo all of them. It is running CentOS 7.5 and I'm interested to see if programs compiled under 6.x or 5.x break.

Use the #BSUB -m n37 statement to target the node.
Update n33-n36 same as n37 (the wrapper is called n37.openmpi.wrapper on all nodes)
Henk 2018/10/08 09:07

Usage is about the same as jobs going to the amber128 queue with two minor changes:

I also bypassed the Amber nvidia version check so we're running in unsupported mode.

Please check your new results against previous output .

Details on how the environment was setup

Here is a submit script for recompiled local versions of Amber, Gromacs and Lammps using a custom wrapper.

/home/hmeij/k20redo/run.sh

#!/bin/bash
# submit via 'bsub < run.sh'
rm -f out err 
#BSUB -e err
#BSUB -o out
#BSUB -q mwgpu
#BSUB -J "K20 test"
###BSUB -m n37  

#n33-n37 are done and all the same 11Oct2018
# the wrapper is called the same on all host

# cuda 9 & openmpi
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/share/apps/CENTOS6/openmpi/1.8.4/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/CENTOS6/openmpi/1.8.4/lib:$LD_LIBRARY_PATH


## leave sufficient time between job submissions (30-60 secs)
## the number of GPUs allocated matches -n value automatically
## always reserve GPU (gpu=1), setting this to 0 is a cpu job only
## reserve 12288 MB (11 GB + 1 GB overhead) memory per GPU
## run all processes (1<=n<=4)) on same node (hosts=1).


# unique job scratch dirs
MYSANSCRATCH=/sanscratch/$LSB_JOBID
MYLOCALSCRATCH=/localscratch/$LSB_JOBID
export MYSANSCRATCH MYLOCALSCRATCH
cd $MYLOCALSCRATCH


# uncomment one software block by removing ONLY one # on each line


## AMBER we need to recreate env, $AMBERHOME is already set
##BSUB -n 1
##BSUB -R "rusage[gpu=1:mem=12288],span[hosts=1]"
#export PATH=/share/apps/CENTOS6/python/2.7.9/bin:$PATH
#export LD_LIBRARY_PATH=/share/apps/CENTOS6/python/2.7.9/lib:$LD_LIBRARY_PATH
#source /usr/local/amber16/amber.sh
## stage the data
#cp -r ~/sharptail/* .
## feed the wrapper
#n37.openmpi.wrapper pmemd.cuda.MPI \
#-O -o mdout.$LSB_JOBID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
## save results
#scp mdout.$LSB_JOBID ~/k20redo/


## GROMACS (using all GPUs example)
##BSUB -n 4
##BSUB -R "rusage[gpu=4:mem=49152],span[hosts=1]"
#export CPU_GPU_REQUEST=4:4
## signal GMXRC is a gpu run with: 1=thread_mpi 2=openmpi
#export GMXRC=2
#export PATH=/usr/local/gromacs-2018/bin:$PATH
#export LD_LIBRARY_PATH=/usr/local/gromacs-2018/lib64:$LD_LIBRARY_PATH
#. /usr/local/gromacs-2018/bin/GMXRC.bash
#cd /home/hmeij/gromacs_bench/gpu/
#n37.openmpi.wrapper gmx_mpi mdrun \
#  -maxh 0.5 -nsteps 600000 -multidir 01 02 03 04 -gpu_id 0123 \
#  -ntmpi 0 -npme 0 -s topol.tpr -ntomp 0 -pin on -nb gpu



## LAMMPS
##BSUB -n 1
##BSUB -R "rusage[gpu=1:mem=12288],span[hosts=1]"
## GPUIDX=1 use allocated GPU(s), GPUIDX=0 cpu run only (view header input file)
#export GPUIDX=1 # use with -var $GPUIDX in inout file, view au.in, or use -suffix 
#export PATH=/usr/local/lammps-22Aug18:$PATH
## stage the data
#cp -r ~/sharptail/* .
## feed the wrapper
#n37.openmpi.wrapper lmp_mpi-double-double-with-pgu \
#-suffix gpu -var GPUIDX $GPUIDX -in in.colloid -l out.colloid.$LSB_JOBID
## save results
#scp out.colloid.$LSB_JOBID ~/k20redo/

ib0

We've lost the ability of bringing up interface ib0 when going to 7.5 and the latest kernel.

Details are described here … http://www.advancedclustering.com/infinibandomni-path-issue-el-7-5-kernel-update/?sysu=bd584af325e6536411a2bc16ad41b3eb

Reflecting on this, this is not necessarily that bad. For GPU compute nodes we do not really need it. This would also free up 5 infiniband ports on the switch and make the available ports a total of 7. That could be allocated to new servers we're thinking of buying.


Back