Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

--- cluster:119 [2013/08/21 11:03]
hmeij
+++ cluster:119 [2021/06/17 15:32] (current)
hmeij07
@@ Line 1: / Line 1: @@
 \\
 **[[cluster:0|Back]]**
-Jobs need to be submitted to the scheduler on host sharptail itself for now and will be dispatched to nodes n33-n37 in queue mwgpu.
- --- //[[hmeij@wesleyan.edu|Meij, Henk]] 2013/08/21 11:01//
 ==== Submitting GPU Jobs ====
+Please plenty of time between multiple GPU job submissions.  Like minutes.
+Jobs need to be submitted to the scheduler via cottontail to queues mwgpu, amber128, exx96.
+This page is old, the gpu resource ''gpu4'' should be used, a more recent page can be found [[cluster:173|K20 Redo Usage]]. Although there might some useful information on this page explaining gpu jobs.
+ --- //[[hmeij@wesleyan.edu|Henk]] 2021/06/17 15:29//
+**Articles**
+  * [[http://www.pgroup.com/lit/articles/insider/v5n2a1.htm]] Tesla vs. Xeon Phi vs. Radeon: A Compiler Writer's Perspective
+  * [[http://www.pgroup.com/lit/articles/insider/v5n2a5.htm]] Calling CUDA Fortran kernels from MATLAB
@@ Line 44: / Line 54: @@
 </code>
-With ''gpu-info'' we can view our running job.  ''gpu-info'' and ''gpu-free'' are available [[http://ambermd.org/gpus/]] (I had to hard code my GPU string information as they came in at 02,03,82&83, you can use deviceQuery to find them).
+With ''gpu-info'' we can view our running job.  ''gpu-info'' and ''gpu-free'' are available <del>[[http://ambermd.org/gpus/]]</del> [[http://ambermd.org/gpus12/#Running]](I had to hard code my GPU string information as they came in at 02,03,82&83, you can use deviceQuery to find them).
 <code>
@@ Line 58: / Line 68: @@
        Tesla K20m      21 C            0 %
 ====================================================
+[hmeij@sharptail sharptail]$ ssh n33 gpu-free
+,3,0
 </code>
@@ Line 116: / Line 131: @@
 #!/bin/bash
 # submit via 'bsub < run.gpu'
-rm -f mdout.[0-9]* auout.[0-9]*
+rm -f mdout.[0-9]* auout.[0-9]* apoa1out.[0-9]*
 #BSUB -e err
 #BSUB -o out
@@ Line 122: / Line 137: @@
 #BSUB -J test
-## leave sufficient time between job submissions (30-60 secs)
+# from greentail we need to set up the module env
-## the number of GPUs allocated matches -n value automatically
+export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:\
-## always reserve GPU (gpu=1), setting this to 0 is a cpu job only
+/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\
-## reserve 6144 MB (5 GB + 20%) memory per GPU
+/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
-## run all processes (1<=n<=4)) on same node (hosts=1).
+/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:\
+/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\
+/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
+export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:\
+/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/lib
-#BSUB -n 1
-#BSUB -R "rusage[gpu=1:mem=6144],span[hosts=1]"
-# unique job scratch dirs
-MYSANSCRATCH=/sanscratch/$LSB_JOBID
-MYLOCALSCRATCH=/localscratch/$LSB_JOBID
-export MYSANSCRATCH MYLOCALSCRATCH
-cd $MYSANSCRATCH
-# AMBER
-# stage the data
-cp ~/sharptail/* .
-# feed the wrapper
-lava.mvapich2.wrapper pmemd.cuda.MPI \
--O -o mdout.$LSB_JOBID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
-# save results
-cp mdout.[0-9]* ~/sharptail/
-# LAMMPS
-# GPUIDX=1 use allocated GPU(s), GPUIDX=0 cpu run only (view header au.inp)
-export GPUIDX=1
-# stage the data
-cp ~/sharptail/* .
-# feed the wrapper
-lava.mvapich2.wrapper lmp_nVidia \
--c off -var GPUIDX $GPUIDX -in au.inp -l auout.$LSB_JOBID
-# save results
-cp auout.[0-9]* ~/sharptail/
-</code>
-==== lava.mvampich2.wrapper ====
-<code>
-#!/bin/bash
-# submit via 'bsub < run.gpu'
-rm -f mdout.[0-9]* auout.[0-9]* apoa1out.[0-9]*
-#BSUB -e err
-#BSUB -o out
-#BSUB -q mwgpu
-#BSUB -J test
 ## leave sufficient time between job submissions (30-60 secs)
@@ Line 217: / Line 198: @@
 # save results
 cp apoa1out.$LSB_JOBID ~/sharptail/
+</code>
+==== gromacs.sub ====
+<code>
+#!/bin/bash
+rm -rf gromacs.out gromacs.err \#* *.log
+# from greentail we need to recreate module env
+export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:\
+/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\
+/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:\
+/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\
+/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
+export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:\
+/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/lib
+#BSUB -o gromacs.out
+#BSUB -e gromacs.err
+#BSUB -N
+#BSUB -J 325monolayer
+# read /share/apps/gromacs/build.sh
+. /share/apps/intel/composerxe/bin/iccvars.sh intel64
+export VMDDIR=/share/apps/vmd/1.8.6
+## CPU RUN: queue mw256, n<=28, must run on one node (thread_mpi)
+##BSUB -q mw256
+##BSUB -n 2
+##BSUB -R "rusage[gpu=0],span[hosts=1]"
+#export PATH=/share/apps/gromacs/4.6-icc-gpu/bin:$PATH
+#. /share/apps/gromacs/4.6-icc-gpu/bin/GMXRC.bash
+#mdrun -nt 2 -s 325topol.tpr -c 325monolayer.gro -e 325ener.edr -o 325traj.trr -x 325traj.xtc
+## GPU RUN: gpu (1-4), queue mwgpu, n (1-4, matches gpu count), must run on one node
+##BSUB -q mwgpu
+##BSUB -n 1
+##BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
+## signal GMXRC is a gpu run with: 1=thread_mpi
+#export GMXRC=1
+#export PATH=/share/apps/gromacs/4.6-icc-gpu/bin:$PATH
+#. /share/apps/gromacs/4.6-icc-gpu/bin/GMXRC.bash
+#lava.mvapich2.wrapper mdrun \
+#-testverlet -s 325topol.tpr -c 325monolayer.gro -e 325ener.edr -o 325traj.trr -x 325traj.xtc
+# GPU RUN: gpu (1-4), queue mwgpu, n (1-4, matches gpu count), must run on one node
+#BSUB -q mwgpu
+#BSUB -n 1
+#BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
+# signal GMXRC is a gpu run with: 2=mvapich2
+export GMXRC=2
+export PATH=/share/apps/gromacs/4.6-mpi-gpu/bin:$PATH
+. /share/apps/gromacs/4.6-mpi-gpu/bin/GMXRC.bash
+lava.mvapich2.wrapper mdrun_mpi \
+-testverlet -s 325topol.tpr -c 325monolayer.gro -e 325ener.edr -o 325traj.trr -x 325traj.xtc
+</code>
+==== matlab.sub ====
+<code>
+#!/bin/bash
+rm -rf out err *.out
+# from greentail we need to recreate module env
+export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:\
+/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\
+/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:\
+/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\
+/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
+export PATH=/share/apps/matlab/2013a/bin:$PATH
+export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:\
+/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/lib
+#BSUB -o out
+#BSUB -e err
+#BSUB -N
+#BSUB -J test
+# GPU RUN: (1-4), queue mwgpu, n (1-4, matches gpu count), must run on one node
+#BSUB -q mwgpu
+#BSUB -n 1
+#BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
+# signal MATGPU is a gpu run
+export MATGPU=1
+lava.mvapich2.wrapper matlab -nodisplay  -r test
+</code>
+==== lava.mvampich2.wrapper ====
+<code>
+#!/bin/sh
+# This is a copy of lava.openmpi.wrapper which came with lava OCS kit
+# Trying to make it work with mvapich2
+# -hmeij 13aug2013
+#
+#  Copyright (c) 2007 Platform Computing
+#
+# This script is a wrapper for openmpi mpirun
+# it generates the machine file based on the hosts
+# given to it by Lava.
+#
+# RLIMIT_MEMLOCK problem with libibverbs -hmeij
+ulimit -l unlimited
+usage() {
+        cat <<USEEOF
+USAGE:  $0
+        This command is a wrapper for mpirun (openmpi).  It can
+        only be run within Lava using bsub e.g.
+                bsub -n # "$0 -np # {my mpi command and args}"
+        The wrapper will automatically generate the
+        machinefile used by mpirun.
+        NOTE:  The list of hosts cannot exceed 4KBytes.
+USEEOF
+}
+if [ x"${LSB_JOBFILENAME}" = x -o x"${LSB_HOSTS}" = x ]; then
+    usage
+    exit -1
+fi
+MYARGS=$*
+WORKDIR=`dirname ${LSB_JOBFILENAME}`
+MACHFILE=${WORKDIR}/mpi_machines
+ARGLIST=${WORKDIR}/mpi_args
+# Check if mpirun is in the PATH -hmeij
+T=`which --skip-alias mpirun_rsh`
+#T=`which mpirun_rsh`
+if [ $? -ne 0 ]; then
+    echo "Error:  mpirun_rsh is not in your PATH."
+    exit -2
+fi
+echo "${MYARGS}" > ${ARGLIST}
+#T=`grep -- -machinefile ${ARGLIST} |wc -l`
+T=`grep -- -hostfile ${ARGLIST} |wc -l`
+if [ $T -gt 0 ]; then
+    echo "Error:  Do not provide the machinefile for mpirun."
+    echo "        It is generated automatically for you."
+    exit -3
+fi
+# Make the open-mpi machine file
+echo "${LSB_HOSTS}" > ${MACHFILE}.lst
+tr '\/ ' '\r\n' < ${MACHFILE}.lst > ${MACHFILE}
+MPIRUN=`which --skip-alias mpirun_rsh`
+#MPIRUN=/share/apps/openmpi/1.2+intel-9/bin/mpirun
+#echo "executing: ${MPIRUN} -x LD_LIBRARY_PATH -machinefile ${MACHFILE} ${MYARGS}"
+# sanity checks number of processes 1-4
+np=`wc -l ${MACHFILE} | awk '{print $1}'`
+if [ $np -lt 1 -o $np -gt 4 ]; then
+    echo "Error:  Incorrect number of processes ($np)"
+    echo "        -n can be an integer in the range of 1 to 4"
+    exit -4
+fi
+# sanity check single node
+nh=`cat ${MACHFILE} | sort -u | wc -l`
+if [ $nh -ne 1 ]; then
+    echo "Error:  No host or more than one host specified ($nh)"
+    exit -5
+fi
+# one host, one to four gpus
+gpunp=`cat ${MACHFILE} | wc -l | awk '{print $1}'`
+gpuhost=`cat ${MACHFILE} | sort -u | tr -d '\n'`
+gpuid=( $(for i in `ssh $gpuhost gpu-free | sed "s/,/ /g"`; do echo $i; done | shuf | head -$gpunp) )
+if [ $gpunp -eq 1 ]; then
+        CUDA_VISIBLE_DEVICES=$gpuid
+        echo "GPU allocation instance $gpuhost:$gpuid"
+else
+        gpuids=`echo ${gpuid[@]} | sed "s/ /,/g"`
+        CUDA_VISIBLE_DEVICES="$gpuids"
+        echo "GPU allocation instance $gpuhost:$CUDA_VISIBLE_DEVICES"
+fi
+# namd ignores this
+export CUDA_VISIBLE_DEVICES
+#debug# setid=`ssh $gpuhost echo $CUDA_VISIBLE_DEVICES | tr '\n' ' '`
+#debug# echo "setid=$setid";
+if [ -n "$GMXRC" ]; then
+        # gromacs needs them from base 0, so gpu 2,3 is string 01
+        if [ ${#gpuid[*]} -eq 1 ]; then
+                gmxrc_gpus="0"
+        elif [ ${#gpuid[*]} -eq 2 ]; then
+                gmxrc_gpus="01"
+        elif [ ${#gpuid[*]} -eq 3 ]; then
+                gmxrc_gpus="012"
+        elif [ ${#gpuid[*]} -eq 4 ]; then
+                gmxrc_gpus="0123"
+        fi
+        if [ $GMXRC -eq 1 ]; then
+                newargs=`echo ${MYARGS} | sed "s/mdrun/mdrun -gpu_id $gmxrc_gpus/g"`
+                echo "executing: $newargs"
+                $newargs
+        elif [ $GMXRC -eq 2 ]; then
+                newargs=`echo ${MYARGS} | sed "s/mdrun_mpi/mdrun_mpi -gpu_id $gmxrc_gpus/g"`
+                echo "executing: ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp $newargs"
+                ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp $newargs
+        fi
+elif [ -n "$MATGPU" ] && [ $MATGPU -eq 1 ]; then
+        echo "executing: ${MYARGS}
+        ${MYARGS}
+elif [ -n "$CHARMRUN" ] && [ $CHARMRUN -eq 1 ]; then
+        cat ${MACHFILE}.lst | tr '\/ ' '\r\n' | sed 's/^/host /g' > ${MACHFILE}
+        echo "executing: charmrun $NAMD_DIR/namd2 +p$gpunp ++nodelist ${MACHFILE} +idlepoll +devices $CUDA_VISIBLE_DEVICES ${MYARGS}"
+        charmrun $NAMD_DIR/namd2 +p$gpunp ++nodelist ${MACHFILE} +idlepoll +devices $CUDA_VISIBLE_DEVICES ${MYARGS}
+else
+        echo "executing: ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp ${MYARGS}"
+        ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp ${MYARGS}
+fi
+exit $?
+</code>
+===== elim code =====
+<code>
+#!/usr/bin/perl
+while (1) {
+        $gpu = 0;
+        $log = '';
+        if (-e "/usr/local/bin/gpu-info" ) {
+                $tmp = `/usr/local/bin/gpu-info | egrep "Tesla K20"`;
+                @tmp = split(/\n/,$tmp);
+                foreach $i (0..$#tmp) {
+                        ($a,$b,$c,$d,$e,$f,$g) = split(/\s+/,$tmp[$i]);
+                        if ( $f == 0 ) { $gpu = $gpu + 1; }
+                        #print "$a $f $gpu\n";
+                        $log .= "$f,";
+                }
+        }
+        # nr_of_args name1 value1
+        $string = "1 gpu $gpu";
+        $h = `hostname`; chop($h);
+        $d = `date +%m/%d/%y_%H:%M:%S`; chop($d);
+        foreach $i ('n33','n34','n35','n36','n37') {
+                if ( "$h" eq "$i" ) {
+                        `echo "$d,$log" >> /share/apps/logs/$h.gpu.log`;
+                }
+        }
+        # you need the \n to flush -hmeij
+        # you also need the space before the line feed -hmeij
+        print "$string \n";
+        # or use
+        #syswrite(OUT,$string,1);
+        # smaller than specified in lsf.shared
+        sleep 10;
+}
 </code>

DokuWiki

User Tools

Site Tools

Differences

Page Tools