\\
**[[cluster:28|Back]]**
=> Platform/OCS's **very good** {{:cluster:lava_using_6.1.pdf|Running Jobs with Platform Lava}} (read it).
=> In all the examples below, ''man //command//'' will provide you with detailed information, like for example ''man bsub''.
===== Jobs =====
Non-Infiniband! For Infiniband submissions go to [[cluster:32|Internal Link]]
This write up will only focus on how to submit jobs using scripts, meaning in batch mode. There is an interactive mode but in general if you create a script then you have a record of how you submitted your job.
So i'm creating two bash shell scripts (they must be bash shells!). The first **myscript** will set up the environment and resources needed, the second **myjob** will contain the actual program i want run and any shell actions needed.
**myscript**
#!/bin/bash
# queue
#BSUB -q idle
# email me (##SUB) or save in $HOME (#SUB)
##BSUB -o outfile.email # standard out
#BSUB -o outfile.err # standard error
# unique job scratch dirs
MYSANSCRATCH=/sanscratch/$LSB_JOBID
MYLOCALSCRATCH=/localscratch/$LSB_JOBID
export MYSANSCRATCH MYLOCALSCRATCH
# run my job
./myjob one-arg two-arg
# label my job
#BSUB -J myLittleJob
The convention '#BSUB -parameter value' passes command line arguments to ''bsub'' ... ''man bsub'' for more information. If you wish to change that behavior add another pound sign like '##BSUB ...' and it will be treated as a comment. So in the example above the standard output will be send to me via email (the default behavior) but standard error output (which could be rather large) is written to a file in my home directory when the job finishes.
Other than that, ENV variables are made available to **myjob**, a queue is defined (actually unnecessary as idle is the default queue) and two command line arguments are passed to **myjob**. Finally, a cute label is assigned.
**myjob**
#!/bin/bash
# pre_exec routine will create scratch dirs
# $MYSANSCRATCH in /sanscratch
# $MYLOCALSCRATCH in /localscratch
# in home directory
for i in `seq 1 25`;
do
d=`date`;
echo "$i $HOSTNAME $2 $1 $d" >> $MYLOCALSCRATCH/outfile;
done
# retrieve some results
tail $MYLOCALSCRATCH/outfile > $MYSANSCRATCH/outfile2
cp $MYSANSCRATCH/outfile2 ./outfile3.$LSB_JOBID
echo DONE ... these dirs will be removed via post_exec
echo $MYSANSCRATCH $MYLOCALSCRATCH
OK, so my program grabs the date and appends it, with the command line arguments, to a file in the MYLOCALSCRATCH directory. Then it grabs the last 10 lines and copies it to the MYSANSCRATCH directory. Just for fun. Finally we copy that to our home directory for keepers. Then we echo 'DONE' to standard out. Marvelous.
===== bsub and bjobs =====
Straightforward.
[hmeij@swallowtail ~]$ bsub < myscript
Job <1001> is submitted to queue .
[hmeij@swallowtail ~]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
1001 hmeij PEND idle swallowtail - myLittleJob Apr 18 11:28
[hmeij@swallowtail ~]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
1001 hmeij RUN idle swallowtail compute-1-14 myLittleJob Apr 18 11:28
[hmeij@swallowtail ~]$ bjobs
No unfinished job found
''bjobs'' can also explain why your job is in PEND status ...
[hmeij@swallowtail gaussian]$ bjobs -p 13892
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
13892 hmeij PEND gaussian swallowtail - run101 Aug 29 16:23
Queue's per-host job slot limit reached: 1 host;
===== bhist =====
You can query the scheduler regarding the status of your job.
[hmeij@swallowtail ~]$ bhist -l 1001
Job <1001>, Job Name , User , Project , Command <#
!/bin/bash; # queue;#BSUB -q idle; # email me (##SUB) or s
ave in $HOME (#SUB);##BSUB -o outfile.email # standard oup
ut;#BSUB -e outfile.err # standard error; # unique job
scratch dirs;MYSANSCRATCH=/sanscratch/$LSB_JOBID;MYLOCALS>
Wed Apr 18 11:28:14: Submitted from host , to Queue , CWD <$
HOME>, Error File ;
Wed Apr 18 11:28:20: Dispatched to ;
Wed Apr 18 11:28:20: Starting (Pid 21569);
Wed Apr 18 11:28:25: Running with execution home , Execution CWD <
/home/hmeij>, Execution Pid <21569>;
Wed Apr 18 11:28:25: Done successfully. The CPU time used is 0.0 seconds;
Wed Apr 18 11:28:35: Post job process done successfully;
Summary of time in seconds spent in various states by Wed Apr 18 11:28:35
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
6 0 5 0 0 0 11
===== Job Ouput =====
The above job submission yields ...
[hmeij@swallowtail hmeij]# ls -l
...
-rw-r--r-- 1 hmeij its 670 Apr 18 11:28 outfile3.1001
-rw-r--r-- 1 hmeij its 0 Apr 18 11:18 outfile.err
...
and the following email
Job was submitted from host by user .
Job was executed on host(s) , in queue , as user .
was used as the home directory.
was used as the working directory.
Started at Wed Apr 18 11:28:20 2007
Results reported at Wed Apr 18 11:28:25 2007
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
#!/bin/bash
# queue
#BSUB -q idle
# email me (##SUB) or save in $HOME (#SUB)
##BSUB -o outfile.email # standard ouput
#BSUB -e outfile.err # standard error
# unique job scratch dirs
MYSANSCRATCH=/sanscratch/$LSB_JOBID
MYLOCALSCRATCH=/localscratch/$LSB_JOBID
export MYSANSCRATCH MYLOCALSCRATCH
# run my job
./myjob one-arg two-arg
# label my job
#BSUB -J myLittleJob
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 0.05 sec.
Max Memory : 4 MB
Max Swap : 117 MB
Max Processes : 3
Max Threads : 3
The output (if any) follows:
DONE ... these dirs will be removed via post_exec
/sanscratch/1001 /localscratch/1001
PS:
Read file for stderr output of this job.
===== b'ees =====
Other ''b//__name__//'' utilities for managing your jobs ...
**''bkill''** JOBID ... stops your job
**''bstop''** JOBID ... suspends your job
**''bresume''** JOBID ... resumes your job
**''brequeue''** JOBID ... stops your job and requeues it
**''brun''** -m HOSTNAME JOBID ... force your job to run (administrators only)
**''bswitch''** ALTERNATE_QUEUE JOBID ... for pending and running jobs
**''bpeek''** JOBID ... peek at your job output while it is running
=> ''bpeek'' shows you the ''tail'' output of standard output and standard error. As an alternative of this, you can follow the progress of your jobs in the directory ~/.lsbatch. For each job there will be a timestamp.jobpid.err, timestamp.jobpid.out and timestam.jobpid.shell file. Do not remove or edit these files while your job is running.
\\
**[[cluster:28|Back]]**