User Tools

Site Tools


cluster:119

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:119 [2013/08/21 15:03]
hmeij
cluster:119 [2021/06/17 19:32] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
- 
-Jobs need to be submitted to the scheduler on host sharptail itself for now and will be dispatched to nodes n33-n37 in queue mwgpu.  
- --- //[[hmeij@wesleyan.edu|Meij, Henk]] 2013/08/21 11:01// 
  
 ==== Submitting GPU Jobs ==== ==== Submitting GPU Jobs ====
 +
 +Please plenty of time between multiple GPU job submissions.  Like minutes.
 +
 +Jobs need to be submitted to the scheduler via cottontail to queues mwgpu, amber128, exx96.
 +
 +This page is old, the gpu resource ''gpu4'' should be used, a more recent page can be found [[cluster:173|K20 Redo Usage]]. Although there might some useful information on this page explaining gpu jobs.
 + --- //[[hmeij@wesleyan.edu|Henk]] 2021/06/17 15:29//
 +
 +**Articles**
 +
 +  * [[http://www.pgroup.com/lit/articles/insider/v5n2a1.htm]] Tesla vs. Xeon Phi vs. Radeon: A Compiler Writer's Perspective 
 +  * [[http://www.pgroup.com/lit/articles/insider/v5n2a5.htm]] Calling CUDA Fortran kernels from MATLAB 
 +
  
  
Line 44: Line 54:
 </code> </code>
  
-With ''gpu-info'' we can view our running job.  ''gpu-info'' and ''gpu-free'' are available [[http://ambermd.org/gpus/]] (I had to hard code my GPU string information as they came in at 02,03,82&83, you can use deviceQuery to find them).+With ''gpu-info'' we can view our running job.  ''gpu-info'' and ''gpu-free'' are available <del>[[http://ambermd.org/gpus/]]</del> [[http://ambermd.org/gpus12/#Running]](I had to hard code my GPU string information as they came in at 02,03,82&83, you can use deviceQuery to find them).
  
 <code> <code>
Line 58: Line 68:
 3       Tesla K20m      21 C            0 % 3       Tesla K20m      21 C            0 %
 ==================================================== ====================================================
 +
 +[hmeij@sharptail sharptail]$ ssh n33 gpu-free
 +1,3,0
 +
 +
  
 </code> </code>
Line 116: Line 131:
 #!/bin/bash #!/bin/bash
 # submit via 'bsub < run.gpu' # submit via 'bsub < run.gpu'
-rm -f mdout.[0-9]* auout.[0-9]*+rm -f mdout.[0-9]* auout.[0-9]* apoa1out.[0-9]*
 #BSUB -e err #BSUB -e err
 #BSUB -o out #BSUB -o out
Line 122: Line 137:
 #BSUB -J test #BSUB -J test
  
-## leave sufficient time between job submissions (30-60 secs) +from greentail we need to set up the module env 
-## the number of GPUs allocated matches -n value automatically +export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:
-## always reserve GPU (gpu=1), setting this to is a cpu job only +/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\ 
-## reserve 6144 MB (GB + 20%) memory per GPU +/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\ 
-## run all processes (1<=n<=4)) on same node (hosts=1).+/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:
 +/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\ 
 +/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\ 
 +/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin 
 +export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:
 +/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:
 +/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:
 +/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:
 +/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:
 +/cm/shared/apps/mvapich2/gcc/64/1.6/lib
  
-#BSUB -n 1 
-#BSUB -R "rusage[gpu=1:mem=6144],span[hosts=1]" 
- 
-# unique job scratch dirs 
-MYSANSCRATCH=/sanscratch/$LSB_JOBID 
-MYLOCALSCRATCH=/localscratch/$LSB_JOBID 
-export MYSANSCRATCH MYLOCALSCRATCH 
-cd $MYSANSCRATCH 
- 
-# AMBER 
-# stage the data 
-cp ~/sharptail/* . 
-# feed the wrapper 
-lava.mvapich2.wrapper pmemd.cuda.MPI \ 
--O -o mdout.$LSB_JOBID -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd 
-# save results 
-cp mdout.[0-9]* ~/sharptail/ 
- 
-# LAMMPS 
-# GPUIDX=1 use allocated GPU(s), GPUIDX=0 cpu run only (view header au.inp) 
-export GPUIDX=1 
-# stage the data 
-cp ~/sharptail/* . 
-# feed the wrapper 
-lava.mvapich2.wrapper lmp_nVidia \ 
--c off -var GPUIDX $GPUIDX -in au.inp -l auout.$LSB_JOBID 
-# save results 
-cp auout.[0-9]* ~/sharptail/ 
- 
-</code> 
- 
- 
-==== lava.mvampich2.wrapper ==== 
- 
-<code> 
- 
-#!/bin/bash 
-# submit via 'bsub < run.gpu' 
-rm -f mdout.[0-9]* auout.[0-9]* apoa1out.[0-9]* 
-#BSUB -e err 
-#BSUB -o out 
-#BSUB -q mwgpu 
-#BSUB -J test 
  
 ## leave sufficient time between job submissions (30-60 secs) ## leave sufficient time between job submissions (30-60 secs)
Line 217: Line 198:
 # save results # save results
 cp apoa1out.$LSB_JOBID ~/sharptail/ cp apoa1out.$LSB_JOBID ~/sharptail/
 +
 +
 +</code>
 +
 +
 +==== gromacs.sub ====
 +
 +<code>
 +
 +#!/bin/bash
 +
 +rm -rf gromacs.out gromacs.err \#* *.log
 +
 +# from greentail we need to recreate module env
 +export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:\
 +/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\
 +/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
 +/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:\
 +/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\
 +/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\
 +/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
 +export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:\
 +/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:\
 +/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
 +/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:\
 +/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:\
 +/cm/shared/apps/mvapich2/gcc/64/1.6/lib
 +
 +#BSUB -o gromacs.out
 +#BSUB -e gromacs.err
 +#BSUB -N
 +#BSUB -J 325monolayer
 +
 +# read /share/apps/gromacs/build.sh
 +. /share/apps/intel/composerxe/bin/iccvars.sh intel64
 +export VMDDIR=/share/apps/vmd/1.8.6
 +
 +## CPU RUN: queue mw256, n<=28, must run on one node (thread_mpi)
 +##BSUB -q mw256
 +##BSUB -n 2
 +##BSUB -R "rusage[gpu=0],span[hosts=1]"
 +#export PATH=/share/apps/gromacs/4.6-icc-gpu/bin:$PATH
 +#. /share/apps/gromacs/4.6-icc-gpu/bin/GMXRC.bash
 +#mdrun -nt 2 -s 325topol.tpr -c 325monolayer.gro -e 325ener.edr -o 325traj.trr -x 325traj.xtc
 +
 +## GPU RUN: gpu (1-4), queue mwgpu, n (1-4, matches gpu count), must run on one node
 +##BSUB -q mwgpu
 +##BSUB -n 1
 +##BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
 +## signal GMXRC is a gpu run with: 1=thread_mpi
 +#export GMXRC=1
 +#export PATH=/share/apps/gromacs/4.6-icc-gpu/bin:$PATH
 +#. /share/apps/gromacs/4.6-icc-gpu/bin/GMXRC.bash
 +#lava.mvapich2.wrapper mdrun \
 +#-testverlet -s 325topol.tpr -c 325monolayer.gro -e 325ener.edr -o 325traj.trr -x 325traj.xtc
 +
 +# GPU RUN: gpu (1-4), queue mwgpu, n (1-4, matches gpu count), must run on one node
 +#BSUB -q mwgpu
 +#BSUB -n 1
 +#BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
 +# signal GMXRC is a gpu run with: 2=mvapich2
 +export GMXRC=2
 +export PATH=/share/apps/gromacs/4.6-mpi-gpu/bin:$PATH
 +. /share/apps/gromacs/4.6-mpi-gpu/bin/GMXRC.bash
 +lava.mvapich2.wrapper mdrun_mpi \
 +-testverlet -s 325topol.tpr -c 325monolayer.gro -e 325ener.edr -o 325traj.trr -x 325traj.xtc
 +
 +
 +</code>
 +
 +==== matlab.sub ====
 +
 +<code>
 +
 +#!/bin/bash
 +
 +rm -rf out err *.out
 +
 +# from greentail we need to recreate module env
 +export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:\
 +/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\
 +/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
 +/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:\
 +/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\
 +/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\
 +/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
 +export PATH=/share/apps/matlab/2013a/bin:$PATH
 +export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:\
 +/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:\
 +/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
 +/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:\
 +/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:\
 +/cm/shared/apps/mvapich2/gcc/64/1.6/lib
 +
 +#BSUB -o out
 +#BSUB -e err
 +#BSUB -N
 +#BSUB -J test
 +
 +# GPU RUN: (1-4), queue mwgpu, n (1-4, matches gpu count), must run on one node
 +#BSUB -q mwgpu
 +#BSUB -n 1
 +#BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
 +# signal MATGPU is a gpu run
 +export MATGPU=1
 +lava.mvapich2.wrapper matlab -nodisplay  -r test
 +
 +
 +</code>
 +
 +==== lava.mvampich2.wrapper ====
 +
 +<code>
 +
 +#!/bin/sh
 +
 +# This is a copy of lava.openmpi.wrapper which came with lava OCS kit
 +# Trying to make it work with mvapich2
 +# -hmeij 13aug2013
 +
 +#
 +#  Copyright (c) 2007 Platform Computing
 +#
 +# This script is a wrapper for openmpi mpirun
 +# it generates the machine file based on the hosts
 +# given to it by Lava.
 +#
 +
 +# RLIMIT_MEMLOCK problem with libibverbs -hmeij
 +ulimit -l unlimited
 +
 +
 +usage() {
 +        cat <<USEEOF
 +USAGE:  $0
 +        This command is a wrapper for mpirun (openmpi).  It can
 +        only be run within Lava using bsub e.g.
 +                bsub -n # "$0 -np # {my mpi command and args}"
 +
 +        The wrapper will automatically generate the
 +        machinefile used by mpirun.
 +
 +        NOTE:  The list of hosts cannot exceed 4KBytes.
 +USEEOF
 +}
 +
 +if [ x"${LSB_JOBFILENAME}" = x -o x"${LSB_HOSTS}" = x ]; then
 +    usage
 +    exit -1
 +fi
 +
 +MYARGS=$*
 +WORKDIR=`dirname ${LSB_JOBFILENAME}`
 +MACHFILE=${WORKDIR}/mpi_machines
 +ARGLIST=${WORKDIR}/mpi_args
 +
 +# Check if mpirun is in the PATH -hmeij
 +T=`which --skip-alias mpirun_rsh`
 +#T=`which mpirun_rsh`
 +if [ $? -ne 0 ]; then
 +    echo "Error:  mpirun_rsh is not in your PATH."
 +    exit -2
 +fi
 +
 +echo "${MYARGS}" > ${ARGLIST}
 +#T=`grep -- -machinefile ${ARGLIST} |wc -l`
 +T=`grep -- -hostfile ${ARGLIST} |wc -l`
 +if [ $T -gt 0 ]; then
 +    echo "Error:  Do not provide the machinefile for mpirun."
 +    echo "        It is generated automatically for you."
 +    exit -3
 +fi
 +
 +# Make the open-mpi machine file
 +echo "${LSB_HOSTS}" > ${MACHFILE}.lst
 +tr '\/ ' '\r\n' < ${MACHFILE}.lst > ${MACHFILE}
 +
 +MPIRUN=`which --skip-alias mpirun_rsh`
 +#MPIRUN=/share/apps/openmpi/1.2+intel-9/bin/mpirun
 +#echo "executing: ${MPIRUN} -x LD_LIBRARY_PATH -machinefile ${MACHFILE} ${MYARGS}"
 +
 +# sanity checks number of processes 1-4
 +np=`wc -l ${MACHFILE} | awk '{print $1}'
 +if [ $np -lt 1 -o $np -gt 4 ]; then
 +    echo "Error:  Incorrect number of processes ($np)"
 +    echo "        -n can be an integer in the range of 1 to 4"
 +    exit -4
 +fi
 +
 +# sanity check single node
 +nh=`cat ${MACHFILE} | sort -u | wc -l` 
 +if [ $nh -ne 1 ]; then
 +    echo "Error:  No host or more than one host specified ($nh)"
 +    exit -5
 +fi
 +
 +# one host, one to four gpus
 +gpunp=`cat ${MACHFILE} | wc -l | awk '{print $1}'`
 +gpuhost=`cat ${MACHFILE} | sort -u | tr -d '\n'`
 +gpuid=( $(for i in `ssh $gpuhost gpu-free | sed "s/,/ /g"`; do echo $i; done | shuf | head -$gpunp) )
 +if [ $gpunp -eq 1 ]; then
 +        CUDA_VISIBLE_DEVICES=$gpuid
 +        echo "GPU allocation instance $gpuhost:$gpuid"
 +else
 +        gpuids=`echo ${gpuid[@]} | sed "s/ /,/g"`
 +        CUDA_VISIBLE_DEVICES="$gpuids"
 +        echo "GPU allocation instance $gpuhost:$CUDA_VISIBLE_DEVICES"
 +fi
 +# namd ignores this
 +export CUDA_VISIBLE_DEVICES
 +#debug# setid=`ssh $gpuhost echo $CUDA_VISIBLE_DEVICES | tr '\n' ' '`
 +#debug# echo "setid=$setid";
 +
 +
 +if [ -n "$GMXRC" ]; then
 +        # gromacs needs them from base 0, so gpu 2,3 is string 01
 +        if [ ${#gpuid[*]} -eq 1 ]; then
 +                gmxrc_gpus="0"
 +        elif [ ${#gpuid[*]} -eq 2 ]; then
 +                gmxrc_gpus="01"
 +        elif [ ${#gpuid[*]} -eq 3 ]; then
 +                gmxrc_gpus="012"
 +        elif [ ${#gpuid[*]} -eq 4 ]; then
 +                gmxrc_gpus="0123"
 +        fi
 +
 +        if [ $GMXRC -eq 1 ]; then
 +                newargs=`echo ${MYARGS} | sed "s/mdrun/mdrun -gpu_id $gmxrc_gpus/g"`
 +                echo "executing: $newargs"
 +                $newargs
 +        elif [ $GMXRC -eq 2 ]; then
 +                newargs=`echo ${MYARGS} | sed "s/mdrun_mpi/mdrun_mpi -gpu_id $gmxrc_gpus/g"`
 +                echo "executing: ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp $newargs"
 +                ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp $newargs
 +        fi
 +
 +elif [ -n "$MATGPU" ] && [ $MATGPU -eq 1 ]; then
 +        echo "executing: ${MYARGS}
 +        ${MYARGS}
 +elif [ -n "$CHARMRUN" ] && [ $CHARMRUN -eq 1 ]; then
 +        cat ${MACHFILE}.lst | tr '\/ ' '\r\n' | sed 's/^/host /g' > ${MACHFILE}
 +        echo "executing: charmrun $NAMD_DIR/namd2 +p$gpunp ++nodelist ${MACHFILE} +idlepoll +devices $CUDA_VISIBLE_DEVICES ${MYARGS}"
 +        charmrun $NAMD_DIR/namd2 +p$gpunp ++nodelist ${MACHFILE} +idlepoll +devices $CUDA_VISIBLE_DEVICES ${MYARGS}
 +else
 +        echo "executing: ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp ${MYARGS}"
 +        ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp ${MYARGS}
 +fi
 +
 +exit $?
 +
 +
 +</code>
 +
 +
 +===== elim code =====
 +
 +<code>
 +
 +#!/usr/bin/perl
 +
 +while (1) {
 +
 +        $gpu = 0;
 +        $log = '';
 +        if (-e "/usr/local/bin/gpu-info" ) {
 +                $tmp = `/usr/local/bin/gpu-info | egrep "Tesla K20"`;
 +                @tmp = split(/\n/,$tmp);
 +                foreach $i (0..$#tmp) {
 +                        ($a,$b,$c,$d,$e,$f,$g) = split(/\s+/,$tmp[$i]);
 +                        if ( $f == 0 ) { $gpu = $gpu + 1; }
 +                        #print "$a $f $gpu\n";
 +                        $log .= "$f,";
 +                }
 +        }
 +        # nr_of_args name1 value1 
 +        $string = "1 gpu $gpu";
 +
 +        $h = `hostname`; chop($h);
 +        $d = `date +%m/%d/%y_%H:%M:%S`; chop($d);
 +        foreach $i ('n33','n34','n35','n36','n37') {
 +                if ( "$h" eq "$i" ) {
 +                        `echo "$d,$log" >> /share/apps/logs/$h.gpu.log`;
 +                }
 +        }
 +
 +        # you need the \n to flush -hmeij
 +        # you also need the space before the line feed -hmeij
 +        print "$string \n"; 
 +        # or use
 +        #syswrite(OUT,$string,1);
 +
 +        # smaller than specified in lsf.shared
 +        sleep 10;
 +
 +}
 +
  
 </code> </code>
cluster/119.1377097412.txt.gz · Last modified: 2013/08/21 15:03 by hmeij