Warning: Undefined array key 14 in /usr/share/dokuwiki/inc/html.php on line 1453

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

--- cluster:173 [2018/08/24 14:11]
hmeij07 created
+++ cluster:173 [2019/03/25 08:04] (current)
hmeij07
@@ Line 6: / Line 6: @@
 One node ''n37'' has been redone with latest Nvidia CUDA drives during summer 2018.  Please test it out before we decide to redo all of them. It is running CentOS 7.5 and I'm interested to see if programs compiled under 6.x or 5.x break.
-The node is in "inactive" state so your job will go pending. I will force them through.
+Use the ''#BSUB -m n37'' statement to target the node. \\
+Update n33-n36 same as n37 (the wrapper is called n37.openmpi.wrapper on all nodes)\\
+ --- //[[hmeij@wesleyan.edu|Henk]] 2018/10/08 09:07// \\
 Usage is about the same as jobs going to the ''amber128'' queue with two minor changes:
@@ Line 17: / Line 19: @@
 ** Please check your new results against previous output **.
+Details on how the environment was setup
+  * [[cluster:170|OpenHPC 1.3.1]] provision server
+  * [[cluster:171|Warewulf Golden Image]] make process
+  * [[cluster:172|K20 Redo]] ''n37''
+Here is a submit script for recompiled local versions of Amber, Gromacs and Lammps using a custom wrapper.
+''/home/hmeij/k20redo/run.sh''
+<code>
+#!/bin/bash
+# submit via 'bsub < run.sh'
+rm -f out err
+#BSUB -e err
+#BSUB -o out
+#BSUB -q mwgpu
+#BSUB -J "K20 test"
+###BSUB -m n37
+#n33-n37 are done and all the same 11Oct2018
+# the wrapper is called the same on all host
+# cuda 9 & openmpi
+export PATH=/usr/local/cuda/bin:$PATH
+export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
+export PATH=/share/apps/CENTOS6/openmpi/1.8.4/bin:$PATH
+export LD_LIBRARY_PATH=/share/apps/CENTOS6/openmpi/1.8.4/lib:$LD_LIBRARY_PATH
+## leave sufficient time between job submissions (30-60 secs)
+## the number of GPUs allocated matches -n value automatically
+## always reserve GPU (gpu=1), setting this to 0 is a cpu job only
+## reserve 12288 MB (11 GB + 1 GB overhead) memory per GPU
+## run all processes (1<=n<=4)) on same node (hosts=1).
+# unique job scratch dirs
+MYSANSCRATCH=/sanscratch/$LSB_JOBID
+MYLOCALSCRATCH=/localscratch/$LSB_JOBID
+export MYSANSCRATCH MYLOCALSCRATCH
+cd $MYLOCALSCRATCH
+# uncomment one software block by removing ONLY one # on each line
+## AMBER we need to recreate env, $AMBERHOME is already set
+##BSUB -n 1
+##BSUB -R "rusage[gpu=1:mem=12288],span[hosts=1]"
+#export PATH=/share/apps/CENTOS6/python/2.7.9/bin:$PATH
+#export LD_LIBRARY_PATH=/share/apps/CENTOS6/python/2.7.9/lib:$LD_LIBRARY_PATH
+#source /usr/local/amber16/amber.sh
+## stage the data
+#cp -r ~/sharptail/* .
+## feed the wrapper
+#n37.openmpi.wrapper pmemd.cuda.MPI \
+#-O -o mdout.$LSB_JOBID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
+## save results
+#scp mdout.$LSB_JOBID ~/k20redo/
+## GROMACS (using all GPUs example)
+##BSUB -n 4
+##BSUB -R "rusage[gpu=4:mem=49152],span[hosts=1]"
+#export CPU_GPU_REQUEST=4:4
+## signal GMXRC is a gpu run with: 1=thread_mpi 2=openmpi
+#export GMXRC=2
+#export PATH=/usr/local/gromacs-2018/bin:$PATH
+#export LD_LIBRARY_PATH=/usr/local/gromacs-2018/lib64:$LD_LIBRARY_PATH
+#. /usr/local/gromacs-2018/bin/GMXRC.bash
+#cd /home/hmeij/gromacs_bench/gpu/
+#n37.openmpi.wrapper gmx_mpi mdrun \
+#  -maxh 0.5 -nsteps 600000 -multidir 01 02 03 04 -gpu_id 0123 \
+#  -ntmpi 0 -npme 0 -s topol.tpr -ntomp 0 -pin on -nb gpu
+## LAMMPS
+##BSUB -n 1
+##BSUB -R "rusage[gpu=1:mem=12288],span[hosts=1]"
+## GPUIDX=1 use allocated GPU(s), GPUIDX=0 cpu run only (view header input file)
+#export GPUIDX=1 # use with -var $GPUIDX in inout file, view au.in, or use -suffix
+#export PATH=/usr/local/lammps-22Aug18:$PATH
+## stage the data
+#cp -r ~/sharptail/* .
+## feed the wrapper
+#n37.openmpi.wrapper lmp_mpi-double-double-with-pgu \
+#-suffix gpu -var GPUIDX $GPUIDX -in in.colloid -l out.colloid.$LSB_JOBID
+## save results
+#scp out.colloid.$LSB_JOBID ~/k20redo/
+</code>
+==== ib0 ====
+We've lost the ability of bringing up interface ''ib0'' when going to 7.5 and the latest kernel.
+Details are described here ... http://www.advancedclustering.com/infinibandomni-path-issue-el-7-5-kernel-update/?sysu=bd584af325e6536411a2bc16ad41b3eb
+Reflecting on this, this is not necessarily that bad.  For GPU compute nodes we do not really need it.  This would also free up 5 infiniband ports on the switch and make the available ports a total of 7.  That could be allocated to new servers we're thinking of buying.
+ \\
+**[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools