Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

--- cluster:119 [2013/09/20 13:41]
hmeij [lava.mvampich2.wrapper]
+++ cluster:119 [2021/06/17 15:32] (current)
hmeij07
@@ Line 1: / Line 1: @@
 \\
 **[[cluster:0|Back]]**
-Jobs need to be submitted to the scheduler on host sharptail itself for now and will be dispatched to nodes n33-n37 in queue mwgpu.
- --- //[[hmeij@wesleyan.edu|Meij, Henk]] 2013/08/21 11:01//
 ==== Submitting GPU Jobs ====
+Please plenty of time between multiple GPU job submissions.  Like minutes.
+Jobs need to be submitted to the scheduler via cottontail to queues mwgpu, amber128, exx96.
+This page is old, the gpu resource ''gpu4'' should be used, a more recent page can be found [[cluster:173|K20 Redo Usage]]. Although there might some useful information on this page explaining gpu jobs.
+ --- //[[hmeij@wesleyan.edu|Henk]] 2021/06/17 15:29//
+**Articles**
+  * [[http://www.pgroup.com/lit/articles/insider/v5n2a1.htm]] Tesla vs. Xeon Phi vs. Radeon: A Compiler Writer's Perspective
+  * [[http://www.pgroup.com/lit/articles/insider/v5n2a5.htm]] Calling CUDA Fortran kernels from MATLAB
@@ Line 44: / Line 54: @@
 </code>
-With ''gpu-info'' we can view our running job.  ''gpu-info'' and ''gpu-free'' are available [[http://ambermd.org/gpus/]] (I had to hard code my GPU string information as they came in at 02,03,82&83, you can use deviceQuery to find them).
+With ''gpu-info'' we can view our running job.  ''gpu-info'' and ''gpu-free'' are available <del>[[http://ambermd.org/gpus/]]</del> [[http://ambermd.org/gpus12/#Running]](I had to hard code my GPU string information as they came in at 02,03,82&83, you can use deviceQuery to find them).
 <code>
@@ Line 58: / Line 68: @@
        Tesla K20m      21 C            0 %
 ====================================================
+[hmeij@sharptail sharptail]$ ssh n33 gpu-free
+,3,0
 </code>
@@ Line 121: / Line 136: @@
 #BSUB -q mwgpu
 #BSUB -J test
+# from greentail we need to set up the module env
+export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:\
+/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\
+/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:\
+/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\
+/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
+export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:\
+/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/lib
 ## leave sufficient time between job submissions (30-60 secs)
@@ Line 181: / Line 212: @@
 # from greentail we need to recreate module env
-export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
+export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:\
-export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:/cm/shared/apps/mvapich2/gcc/64/1.6/lib
+/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\
+/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:\
+/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\
+/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
+export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:\
+/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/lib
 #BSUB -o gromacs.out
@@ Line 204: / Line 246: @@
 ##BSUB -q mwgpu
 ##BSUB -n 1
-##BSUB -R "rusage[gpu=1],span[hosts=1]"
+##BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
 ## signal GMXRC is a gpu run with: 1=thread_mpi
 #export GMXRC=1
@@ Line 215: / Line 257: @@
 #BSUB -q mwgpu
 #BSUB -n 1
-#BSUB -R "rusage[gpu=1],span[hosts=1]"
+#BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
 # signal GMXRC is a gpu run with: 2=mvapich2
 export GMXRC=2
@@ Line 225: / Line 267: @@
 </code>
+==== matlab.sub ====
+<code>
+#!/bin/bash
+rm -rf out err *.out
+# from greentail we need to recreate module env
+export PATH=/home/apps/bin:/cm/local/apps/cuda50/libs/304.54/bin:\
+/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:/cm/shared/apps/lammps/cuda/2013-01-27/:\
+/cm/shared/apps/amber/amber12/bin:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:\
+/usr/sbin:/cm/shared/apps/cuda50/toolkit/5.0.35/bin:/cm/shared/apps/cuda50/sdk/5.0.35/bin/linux/release:\
+/cm/shared/apps/cuda50/libs/current/bin:/cm/shared/apps/cuda50/toolkit/5.0.35/open64/bin:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/bin:/cm/shared/apps/mvapich2/gcc/64/1.6/sbin
+export PATH=/share/apps/matlab/2013a/bin:$PATH
+export LD_LIBRARY_PATH=/cm/local/apps/cuda50/libs/304.54/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/amber/amber12/lib:\
+/cm/shared/apps/amber/amber12/lib64:/cm/shared/apps/namd/ibverbs-smp-cuda/2013-06-02/:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/lib64:/cm/shared/apps/cuda50/libs/current/lib64:\
+/cm/shared/apps/cuda50/toolkit/5.0.35/open64/lib:/cm/shared/apps/cuda50/toolkit/5.0.35/extras/CUPTI/lib:\
+/cm/shared/apps/mvapich2/gcc/64/1.6/lib
+#BSUB -o out
+#BSUB -e err
+#BSUB -N
+#BSUB -J test
+# GPU RUN: (1-4), queue mwgpu, n (1-4, matches gpu count), must run on one node
+#BSUB -q mwgpu
+#BSUB -n 1
+#BSUB -R "rusage[gpu=1:mem=7000],span[hosts=1]"
+# signal MATGPU is a gpu run
+export MATGPU=1
+lava.mvapich2.wrapper matlab -nodisplay  -r test
+</code>
 ==== lava.mvampich2.wrapper ====
@@ Line 327: / Line 410: @@
 #debug# setid=`ssh $gpuhost echo $CUDA_VISIBLE_DEVICES | tr '\n' ' '`
 #debug# echo "setid=$setid";
 if [ -n "$GMXRC" ]; then
+        # gromacs needs them from base 0, so gpu 2,3 is string 01
+        if [ ${#gpuid[*]} -eq 1 ]; then
+                gmxrc_gpus="0"
+        elif [ ${#gpuid[*]} -eq 2 ]; then
+                gmxrc_gpus="01"
+        elif [ ${#gpuid[*]} -eq 3 ]; then
+                gmxrc_gpus="012"
+        elif [ ${#gpuid[*]} -eq 4 ]; then
+                gmxrc_gpus="0123"
+        fi
         if [ $GMXRC -eq 1 ]; then
-                newargs=`echo ${MYARGS} | sed 's/mdrun/mdrun -gpu_id $CUDA_VISIBLE_DEVICES/g'`
+                newargs=`echo ${MYARGS} | sed "s/mdrun/mdrun -gpu_id $gmxrc_gpus/g"`
                 echo "executing: $newargs"
                 $newargs
         elif [ $GMXRC -eq 2 ]; then
-                newargs=`echo ${MYARGS} | sed 's/mdrun_mpi/mdrun_mpi -gpu_id $CUDA_VISIBLE_DEVICES/g'`
+                newargs=`echo ${MYARGS} | sed "s/mdrun_mpi/mdrun_mpi -gpu_id $gmxrc_gpus/g"`
-                echo "executing: $newargs"
+                echo "executing: ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp $newargs"
-                $newargs
+                ${MPIRUN} -ssh -hostfile ${MACHFILE} -np $gpunp $newargs
         fi
+elif [ -n "$MATGPU" ] && [ $MATGPU -eq 1 ]; then
+        echo "executing: ${MYARGS}
+        ${MYARGS}
 elif [ -n "$CHARMRUN" ] && [ $CHARMRUN -eq 1 ]; then
         cat ${MACHFILE}.lst | tr '\/ ' '\r\n' | sed 's/^/host /g' > ${MACHFILE}
@@ Line 350: / Line 447: @@
 exit $?
+</code>
+===== elim code =====
+<code>
+#!/usr/bin/perl
+while (1) {
+        $gpu = 0;
+        $log = '';
+        if (-e "/usr/local/bin/gpu-info" ) {
+                $tmp = `/usr/local/bin/gpu-info | egrep "Tesla K20"`;
+                @tmp = split(/\n/,$tmp);
+                foreach $i (0..$#tmp) {
+                        ($a,$b,$c,$d,$e,$f,$g) = split(/\s+/,$tmp[$i]);
+                        if ( $f == 0 ) { $gpu = $gpu + 1; }
+                        #print "$a $f $gpu\n";
+                        $log .= "$f,";
+                }
+        }
+        # nr_of_args name1 value1
+        $string = "1 gpu $gpu";
+        $h = `hostname`; chop($h);
+        $d = `date +%m/%d/%y_%H:%M:%S`; chop($d);
+        foreach $i ('n33','n34','n35','n36','n37') {
+                if ( "$h" eq "$i" ) {
+                        `echo "$d,$log" >> /share/apps/logs/$h.gpu.log`;
+                }
+        }
+        # you need the \n to flush -hmeij
+        # you also need the space before the line feed -hmeij
+        print "$string \n";
+        # or use
+        #syswrite(OUT,$string,1);
+        # smaller than specified in lsf.shared
+        sleep 10;
+}

DokuWiki

User Tools

Site Tools

Differences

Page Tools