Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+\\
+**[[cluster:28|Back]]**
+=> Lava, the scheduler, is not natively capable for parallel jobs submissions.  So a wrapper script is necessary.  It will obtain the hosts from the LSB_HOSTS variable and build the "machines" file. Follow the **TEST** link below for detailed information.
+=> There is a splendid course offered by NCSA at UIUC about MPI.  If you're serious about MPI, take it; you can find a link to access this course **[[cluster:28|here]]**
+=> In all the examples below, ''man //command//'' will provide you with detailed information, like for example ''man bsub''.
+===== Jobs =====
+Infiniband!  For non-Infiniband jobs go to [[cluster:30|Internal Link]]
+PLEASE READ THE 'ENV TEST' SECTION, IT'LL EXPLAIN WHY IT IS COMPLICATED.  \\
+Also, you need to test that your environment is set up correctly  => **[[cluster:31|ENV TEST]]** <=
+This write up will only focus on how to submit jobs using scripts, meaning in batch mode. A single bash shell script (they must be bash shells!) will submit myscript to the scheduler.
+**imyscript**
+<code>
+#!/bin/bash
+# queue
+#BSUB -q idebug -n 16
+# email me (##SUB) or save in $HOME (#SUB)
+##BSUB -o outfile.email # standard ouput
+#BSUB  -e outfile.err   # standard error
+# unique job scratch dirs
+MYSANSCRATCH=/sanscratch/$LSB_JOBID
+MYLOCALSCRATCH=/localscratch/$LSB_JOBID
+export MYSANSCRATCH MYLOCALSCRATCH
+# run my job
+/share/apps/bin/mpich-mpirun -np 16 /share/apps/openmpi-1.2/bin/cpi
+echo DONE ... these dirs will be removed via post_exec
+echo $MYSANSCRATCH $MYLOCALSCRATCH
+# label my job
+#BSUB -J myLittleiJob
+</code>
+This looks much like the non-infiniband job submissions but there are some key changes.  First i specify a queue with nodes connected to the infiniband switch (idebug). We also specify we will need 16 processors.  Queue idebug is comprised of nodes ilogin/ilogin2 each with dual quad CPUs so 2x2x4=16 cores, so we will be using all of them.
+The most significant change is that we will be calling a 'wrapper' script.  This script ''mpich-mpirun'' wraps the program, surprise, ''mpirun''.  The reason for this is that the wrapper will build the ''machines'' file on the fly.
+If you want to use the [[http://www.lam-mpi.org/|Local Area Multicomputer (LAM)]] MPI libraries use the following wrapper script: ''/share/apps/bin/mpich-mpirun.gnulam''
+"make today an [[http://www.open-mpi.org/|OpenMPI]] day"
+===== bsub and bjobs =====
+Straightforward.
+<code>
+[hmeij@swallowtail ~]$ bsub < imyscript
+Job <1011> is submitted to queue <idebug>.
+</code>
+<code>
+[hmeij@swallowtail ~]$ bjobs
+JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
+    hmeij   PEND  idebug     swallowtail    -        myLittleiJob Apr 19 14:54
+</code>
+<code>
+[hmeij@swallowtail ~]$ bjobs
+JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
+    hmeij   RUN   idebug     swallowtail\
+  compute-1-16:compute-1-16:compute-1-16:compute-1-16:\
+  compute-1-16:compute-1-16:compute-1-16:compute-1-16:\
+  compute-1-15:compute-1-15:compute-1-15:compute-1-15:\
+  compute-1-15:compute-1-15:compute-1-15:compute-1-15 myLittleiJob Apr 19 14:54
+</code>
+<code>
+[hmeij@swallowtail ~]$ bjobs
+No unfinished job found
+</code>
+Note: as expected 8 cores (EXEC_HOST) were invoked on each node.
+===== bhist =====
+You can query the scheduler regarding the status of your job.
+<code>
+[hmeij@swallowtail ~]$ bhist -l 1011
+Job <1011>, Job Name <myLittleiJob>, User <hmeij>, Project <default>, Command <
+                     #!/bin/bash; # queue;#BSUB -q idebug -n 16; # email me (##
+                     SUB) or save in $HOME (#SUB);##BSUB -o outfile.email # sta
+                     ndard ouput;#BSUB  -e outfile.err   # standard error; # un
+                     ique job scratch dirs;MYSANSCRATCH=/sanscratch/$LSB_JOBID>
+Thu Apr 19 14:54:19: Submitted from host <swallowtail>, to Queue <idebug>, CWD
+                     <$HOME>, Error File <outfile.err>, 16 Processors Requested
+                     ;
+Thu Apr 19 14:54:24: Dispatched to 16 Hosts/Processors <compute-1-16> <compute-
+-16> <compute-1-16> <compute-1-16> <compute-1-16> <comput
+                     e-1-16> <compute-1-16> <compute-1-16> <compute-1-15> <comp
+                     ute-1-15> <compute-1-15> <compute-1-15> <compute-1-15> <co
+                     mpute-1-15> <compute-1-15> <compute-1-15>;
+Thu Apr 19 14:54:24: Starting (Pid 6266);
+Thu Apr 19 14:54:31: Running with execution home </home/hmeij>, Execution CWD <
+                     /home/hmeij>, Execution Pid <6266>;
+Thu Apr 19 14:55:47: Done successfully. The CPU time used is 0.0 seconds;
+Thu Apr 19 14:55:57: Post job process done successfully;
+Summary of time in seconds spent in various states by  Thu Apr 19 14:55:57
+  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
+        0        83       0        0        0        88
+</code>
+===== Job Ouput =====
+The above job submission yields ...
+<code>
+[hmeij@swallowtail ~]$ cat outfile.err
+Process 11 on compute-1-16.local
+Process 6 on compute-1-15.local
+Process 14 on compute-1-16.local
+Process 0 on compute-1-15.local
+Process 1 on compute-1-15.local
+Process 2 on compute-1-15.local
+Process 3 on compute-1-15.local
+Process 8 on compute-1-16.local
+Process 4 on compute-1-15.local
+Process 9 on compute-1-16.local
+Process 5 on compute-1-15.local
+Process 10 on compute-1-16.local
+Process 7 on compute-1-15.local
+Process 12 on compute-1-16.local
+Process 13 on compute-1-16.local
+Process 15 on compute-1-16.local
+</code>
+and the following email
+<code>
+Job <myLittleiJob> was submitted from host <swallowtail> by user <hmeij>.
+Job was executed on host(s) <8*compute-1-16>, in queue <idebug>, as user <hmeij>.
+                            <8*compute-1-15>
+</home/hmeij> was used as the home directory.
+</home/hmeij> was used as the working directory.
+Started at Thu Apr 19 14:54:24 2007
+Results reported at Thu Apr 19 14:55:47 2007
+Your job looked like:
+------------------------------------------------------------
+# LSBATCH: User input
+#!/bin/bash
+# queue
+#BSUB -q idebug -n 16
+# email me (##SUB) or save in $HOME (#SUB)
+##BSUB -o outfile.email # standard ouput
+#BSUB  -e outfile.err   # standard error
+# unique job scratch dirs
+MYSANSCRATCH=/sanscratch/$LSB_JOBID
+MYLOCALSCRATCH=/localscratch/$LSB_JOBID
+export MYSANSCRATCH MYLOCALSCRATCH
+# run my job
+/share/apps/bin/mpich-mpirun -np 16 /share/apps/openmpi-1.2/bin/cpi
+# label my job
+#BSUB -J myLittleiJob
+------------------------------------------------------------
+Successfully completed.
+Resource usage summary:
+    CPU time   :      0.05 sec.
+    Max Memory :         7 MB
+    Max Swap   :       205 MB
+    Max Processes  :         5
+    Max Threads    :         5
+The output (if any) follows:
+pi is approximately 3.1416009869231245, Error is 0.0000083333333314
+wall clock time = 0.312946
+DONE ... these dirs will be removed via post_exec
+/sanscratch/1011 /localscratch/1011
+PS:
+Read file <outfile.err> for stderr output of this job.
+</code>
+===== Bingo =====
+When i ran these OpenMPI invocations i was also running a HPLinpack benchmark on the nodes on the infiniband (to assess if the nodes would respond).  **[[cluster:26|Follow this to read about the  HPLinpack runs.]]**
+The idebug queue overrides the job slots set for each node (Max Job Slots = # of cores => 8).  It allows for QJOB_LIMIT=16 and UJOB_LIMIT=16.  The benchmark is already running 8 jobs per node.  Our job will be asking for 8 more per host.  So basically, the host's job slots are exhausted as well as our user limit.
+{{:cluster:cpi.gif|Cute}}
+And so it was.\\
+===== The Problem =====
+(important i repeat this from another page  --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/04/19 15:52//)
+Once you have your binary compiled, you can execute it on the head node or any other node with a hardcoded ''machines'' file specified. Like so (look at the code of this script):
+<code>
+[hmeij@swallowtail ~]$/share/apps/bin/cpi.run
+</code>
+This will not work when submitting your program to ''bsub''.  Platform reports:
+<hi yellow>
+Lava (your scheduler) is not natively capable for parallel jobs, so you will have to write your own integration script to parse the hosts allocated by LSF (with LSB_HOSTS variable) and integrate them to your MPI distribution.
+</hi>
+<hi orange>
+Also, because the lack of LSF's parallel support daemons, these scripts can only provide a loose integration to Lava. Specifically, Lava only knows the mpirun process on the first host; not knowledge to other parallel processes in other hosts involved in a paralell job. So if, in some circumstances, a parallel job fails, Lava cannot clean up the leftover processes, for example, mpich 1's shared-memory leftovers. You may want to regularly checks on your cluster on this issue.
+</hi>
+And this makes the job submission process for parallel jobs tedious.
+\\
+**[[cluster:28|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools