Warning: Undefined array key "DOKU_PREFS" in /usr/share/dokuwiki/inc/common.php on line 2082
cluster:30 [DokuWiki]

User Tools

Site Tools


cluster:30

Warning: Undefined array key 0 in /usr/share/dokuwiki/inc/html.php on line 1271

Warning: Trying to access array offset on value of type bool in /usr/share/dokuwiki/inc/html.php on line 1164

Warning: Trying to access array offset on value of type bool in /usr/share/dokuwiki/inc/html.php on line 1168

Warning: Trying to access array offset on value of type bool in /usr/share/dokuwiki/inc/html.php on line 1171

Warning: Trying to access array offset on value of type bool in /usr/share/dokuwiki/inc/html.php on line 1172

Warning: Undefined array key 0 in /usr/share/dokuwiki/inc/ChangeLog/ChangeLog.php on line 345

Warning: Undefined array key 1 in /usr/share/dokuwiki/inc/html.php on line 1453

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1454

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cluster:30 [2007/08/31 10:24] (current)
Line 1: Line 1:
 +\\
 +**[[cluster:28|Back]]**
  
 +=> Platform/OCS's **very good** {{:cluster:lava_using_6.1.pdf|Running Jobs with Platform Lava}} (read it). 
 +
 +=> In all the examples below, ''man //command//'' will provide you with detailed information, like for example ''man bsub''.
 +
 +
 +
 +
 +
 +
 +===== Jobs =====
 +
 +Non-Infiniband! For Infiniband submissions go to [[cluster:32|Internal Link]]
 +
 +This write up will only focus on how to submit jobs using scripts, meaning in batch mode.  There is an interactive mode but in general if you create a script then you have a record of how you submitted your job.
 +
 +So i'm creating two bash shell scripts (they must be bash shells!).  The first **myscript** will set up the environment and resources needed, the second **myjob** will contain the actual program i want run and any shell actions needed.
 +
 +**myscript**
 +<code>
 +#!/bin/bash
 +
 +# queue
 +#BSUB -q idle
 +
 +# email me (##SUB) or save in $HOME (#SUB)
 +##BSUB -o outfile.email   # standard out
 +#BSUB  -o outfile.err     # standard error
 +
 +# unique job scratch dirs
 +MYSANSCRATCH=/sanscratch/$LSB_JOBID
 +MYLOCALSCRATCH=/localscratch/$LSB_JOBID
 +export MYSANSCRATCH MYLOCALSCRATCH
 +
 +# run my job
 +./myjob one-arg two-arg
 +
 +# label my job
 +#BSUB -J myLittleJob
 +</code>
 +
 +The convention '#BSUB -parameter value' passes command line arguments to ''bsub'' ... ''man bsub'' for more information.  If you wish to change that behavior add another pound sign like '##BSUB ...' and it will be treated as a comment. So in the example above the standard output will be send to me via email (the default behavior) but standard error output (which could be rather large) is written to a file in my home directory when the job finishes.
 +
 +Other than that, ENV variables are made available to **myjob**, a queue is defined (actually unnecessary as idle is the default queue) and two command line arguments are passed to **myjob**.  Finally, a cute label is assigned.
 +
 +**myjob**
 +<code>
 +#!/bin/bash
 +
 +# pre_exec routine will create scratch dirs
 +# $MYSANSCRATCH   in /sanscratch
 +# $MYLOCALSCRATCH in /localscratch
 +
 +# in home directory
 +for i in `seq 1 25`; 
 +do 
 +d=`date`; 
 +echo "$i $HOSTNAME $2 $1 $d" >> $MYLOCALSCRATCH/outfile; 
 +done
 +
 +# retrieve some results
 +tail $MYLOCALSCRATCH/outfile > $MYSANSCRATCH/outfile2
 +cp $MYSANSCRATCH/outfile2 ./outfile3.$LSB_JOBID
 +
 +echo DONE ... these dirs will be removed via post_exec
 +echo $MYSANSCRATCH $MYLOCALSCRATCH
 +</code>
 +
 +OK, so my program grabs the date and appends it, with the command line arguments, to a file in the MYLOCALSCRATCH directory.  Then it grabs the last 10 lines and copies it to the MYSANSCRATCH directory. Just for fun.  Finally we copy that to our home directory for keepers.  Then we echo 'DONE' to standard out. Marvelous.
 +
 +
 +
 +
 +===== bsub and bjobs =====
 +
 +Straightforward.
 +
 +<code>
 +[hmeij@swallowtail ~]$ bsub < myscript
 +Job <1001> is submitted to queue <idle>.
 +</code>
 +
 +<code>
 +[hmeij@swallowtail ~]$ bjobs
 +JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
 +1001    hmeij   PEND  idle       swallowtail    -        myLittleJob Apr 18 11:28
 +</code>
 +
 +<code>
 +[hmeij@swallowtail ~]$ bjobs
 +JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
 +1001    hmeij   RUN   idle       swallowtail compute-1-14 myLittleJob Apr 18 11:28
 +</code>
 +
 +<code>
 +[hmeij@swallowtail ~]$ bjobs
 +No unfinished job found
 +</code>
 +
 +<hi #ffff00>''bjobs'' can also explain why your job is in PEND status ...</hi>
 +
 +<code>
 +[hmeij@swallowtail gaussian]$ bjobs -p 13892
 +JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
 +13892   hmeij   PEND  gaussian   swallowtail    -        run101     Aug 29 16:23
 + Queue's per-host job slot limit reached: 1 host;
 +</code>
 +
 +===== bhist =====
 +
 +You can query the scheduler regarding the status of your job.
 +
 +<code>
 +[hmeij@swallowtail ~]$ bhist -l 1001
 +
 +Job <1001>, Job Name <myLittleJob>, User <hmeij>, Project <default>, Command <#
 +                     !/bin/bash; # queue;#BSUB -q idle; # email me (##SUB) or s
 +                     ave in $HOME (#SUB);##BSUB -o outfile.email # standard oup
 +                     ut;#BSUB  -e outfile.err   # standard error; # unique job
 +                     scratch dirs;MYSANSCRATCH=/sanscratch/$LSB_JOBID;MYLOCALS>
 +
 +Wed Apr 18 11:28:14: Submitted from host <swallowtail>, to Queue <idle>, CWD <$
 +                     HOME>, Error File <outfile.err>;
 +Wed Apr 18 11:28:20: Dispatched to <compute-1-14>;
 +Wed Apr 18 11:28:20: Starting (Pid 21569);
 +Wed Apr 18 11:28:25: Running with execution home </home/hmeij>, Execution CWD <
 +                     /home/hmeij>, Execution Pid <21569>;
 +Wed Apr 18 11:28:25: Done successfully. The CPU time used is 0.0 seconds;
 +Wed Apr 18 11:28:35: Post job process done successfully;
 +
 +Summary of time in seconds spent in various states by  Wed Apr 18 11:28:35
 +  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
 +  6        0        5        0        0        0        11
 +</code>
 +
 +
 +===== Job Ouput =====
 +
 +The above job submission yields ...
 +
 +<code>
 +[hmeij@swallowtail hmeij]# ls -l
 +...
 +-rw-r--r--  1 hmeij its  670 Apr 18 11:28 outfile3.1001
 +-rw-r--r--  1 hmeij its    0 Apr 18 11:18 outfile.err
 +...
 +</code>
 +
 +and the following email
 +
 +<code>
 +Job <myLittleJob> was submitted from host <swallowtail> by user <hmeij>.
 +Job was executed on host(s) <compute-1-14.local>, in queue <idle>, as user <hmeij>.
 +</home/hmeij> was used as the home directory.
 +</home/hmeij> was used as the working directory.
 +Started at Wed Apr 18 11:28:20 2007
 +Results reported at Wed Apr 18 11:28:25 2007
 +
 +Your job looked like:
 +
 +------------------------------------------------------------
 +# LSBATCH: User input
 +#!/bin/bash
 +
 +# queue
 +#BSUB -q idle
 +
 +# email me (##SUB) or save in $HOME (#SUB)
 +##BSUB -o outfile.email # standard ouput
 +#BSUB  -e outfile.err   # standard error
 +
 +# unique job scratch dirs
 +MYSANSCRATCH=/sanscratch/$LSB_JOBID
 +MYLOCALSCRATCH=/localscratch/$LSB_JOBID
 +export MYSANSCRATCH MYLOCALSCRATCH
 +
 +# run my job
 +./myjob one-arg two-arg
 +
 +# label my job
 +#BSUB -J myLittleJob
 +
 +
 +------------------------------------------------------------
 +
 +Successfully completed.
 +
 +Resource usage summary:
 +
 +    CPU time   :      0.05 sec.
 +    Max Memory :         4 MB
 +    Max Swap   :       117 MB
 +
 +    Max Processes  :         3
 +    Max Threads    :         3
 +
 +The output (if any) follows:
 +
 +DONE ... these dirs will be removed via post_exec
 +/sanscratch/1001 /localscratch/1001
 +
 +
 +PS:
 +
 +Read file <outfile.err> for stderr output of this job.
 +</code>
 +
 +
 +
 +
 +
 +
 +===== b'ees =====
 +
 +Other ''b//__name__//'' utilities for managing your jobs ...
 +
 +**''bkill''** JOBID ... stops your job
 +
 +**''bstop''** JOBID ... suspends your job
 +
 +**''bresume''** JOBID ... resumes your job
 +
 +**''brequeue''** JOBID ... stops your job and requeues it
 +
 +**''brun''** -m HOSTNAME JOBID ... force your job to run (administrators only)
 +
 +**''bswitch''** ALTERNATE_QUEUE JOBID ... for pending and running jobs
 +
 +**''bpeek''** JOBID ... peek at your job output while it is running
 +
 +=> ''bpeek'' shows you the ''tail'' output of standard output and standard error.  As an alternative of this, you can follow the progress of your jobs in the directory ~/.lsbatch.  For each job there will be a timestamp.jobpid.err, timestamp.jobpid.out and timestam.jobpid.shell file.  Do not remove or edit these files while your job is running.
 +
 +
 +\\
 +**[[cluster:28|Back]]**
cluster/30.txt ยท Last modified: 2007/08/31 10:24 (external edit)