User Tools

Site Tools


cluster:62

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cluster:62 [2007/12/21 14:08] (current)
Line 1: Line 1:
 +\\
 +**[[cluster:​0|Back]]**
  
 +
 +
 +====== Automated Submissions ======
 +
 +Quanli walked into the office with a request: how can one automate the submission of tons of jobs?  In his case Gaussian jobs.  "Job Arrays"​ i answered confidently,​ but that turned out to be a bit of a problem. ​ Still working on that.
 +
 +However, we took an approach to write a script that generates the job files for you.  The basic idea is build a template. ​ Use that template and fill in dynamic data.  Then submit those job files in one swoop.
 +
 +I'm including the simple scripts and steps below (files are in /​home/​hmeij/​batch). ​ You can build the idea out based on your needs.
 +
 +
 +
 +====== Script Approach ======
 +
 +First you need to create the input data files. ​ Since each file will be different (but not in my examples) you need to do this manually (or, heck, write another script for that step). ​ So in our case we have 5 input files with the naming convention of ''​in.1 - in.5''​.
 +
 +Next we build a template for the job files we want to generate. ​ Below is our sample. ​ The triple uppercase fields are the values we wish to dynamically fill in using our script.
 +
 +  * file ''​build.template''​
 +
 +<​code>​
 +
 +#!/bin/bash
 +# TEMPLATE
 +
 +#BSUB -q QQQ
 +#BSUB -n NNN
 +#BSUB -J JJJ
 +#BSUB -o OOO
 +#BSUB -e EEE
 + 
 +MYSANSCRATCH=/​sanscratch/​$LSB_JOBID
 +MYLOCALSCRATCH=/​localscratch/​$LSB_JOBID
 +export MYSANSCRATCH MYLOCALSCRATCH
 +cd $MYSANSCRATCH
 +
 +export GAUSS_SCRDIR="​$MYLOCALSCRATCH"​
 +export g03root="/​share/​apps/​gaussian/​g03root"​
 +. $g03root/​g03/​bsd/​g03.profile
 +
 +cp PPP/in.III ./in
 +g03 < ./in > ./out
 +cp ./out PPP/​out.III."​$LSB_JOBID"​
 +
 +</​code>​
 +
 +Here is the simple script that will generate our job files. ​ It basically has some default values you may override for queue   name and the number of processors you need.  The script looks for the number of input data files, and then generates a job file for each.  Finally, the script builds a file you can use to submit those job files.
 +
 +  * file ''​build.jobfiles''​
 +
 +<​code>​
 +#​!/​usr/​bin/​perl
 +
 +if ($#ARGV == 1) {
 +        $q = $ARGV[0];
 +        $n = $ARGV[1];
 +        print "​Usage:​ make sure the -n value below matches \%nprocs\n";​
 +        print "Using user defined values of q=$q, -n=$n\n";​
 +} else {
 +        print "​Usage:​ ./​build.jobfiles queue_name nr_of_procs\n";​
 +        print " ​      Using default of q=elw, -n=1\n";​
 +        $q = elw;
 +        $n = 1;
 +}
 +
 +
 +# load template into memory
 +open(F,"​build.template"​);​
 +while (<F>) {
 +        push @T, $_;
 +}
 +close F;
 +
 +# how many input files
 +$t = ` ls in.* | wc -l`;
 +chop($t);
 +print " ​      Found $t input data files.\n";​
 +
 +# where are we
 +$p = `pwd`;
 +chop($p);
 +
 +# build job files for bsub
 +foreach $i (1..$t) {
 +        undef $ss;
 +        foreach $j (2..$#T) {
 +                $s = $T[$j];
 +                $s =~ s/QQQ/$q/g;
 +                $s =~ s/NNN/$n/g;
 +                $s =~ s/​JJJ/​job.$i/​g;​
 +                $s =~ s/​OOO/​out.$i/​g;​
 +                $s =~ s/​EEE/​err.$i/​g;​
 +                $s =~ s/III/$i/g;
 +                $s =~ s/PPP/$p/g;
 +                $ss .= $s;
 +        }
 +        open(O,">​job.$i"​);​
 +        print O "#​!/​bin/​bash\n$ss";​
 +        close O;
 +}
 +print " ​      Build $t job files.\n";​
 +
 +# lazy, build a submit script
 +open(O,">​build.submit"​);​
 +print O "#​\!/​bin/​bash\nfor i in \`seq 1 $t\`\ndo\nbsub \< job.\$i\ndone\n";​
 +close O;
 +`chmod u+x build.submit`;​
 +
 +print "Done. Verify a job file, then submit like so '​./​build.submit'​\n";​
 +</​code>​
 +
 +Here is how it works. Step by step.
 +
 +<​code>​
 +
 +[hmeij@swallowtail batch]$ newgrp gaussian
 +
 +[hmeij@swallowtail batch]$ ll
 +total 28
 +-rwxr--r-- ​ 1 hmeij its 1165 Dec 21 11:01 build.jobfiles
 +-rw-r--r-- ​ 1 hmeij its  409 Dec 20 16:36 build.template
 +-rw-r--r-- ​ 1 hmeij its  460 Dec 20 15:58 in.1
 +-rw-r--r-- ​ 1 hmeij its  460 Dec 20 15:59 in.2
 +-rw-r--r-- ​ 1 hmeij its  460 Dec 20 15:59 in.3
 +-rw-r--r-- ​ 1 hmeij its  460 Dec 20 15:59 in.4
 +-rw-r--r-- ​ 1 hmeij its  460 Dec 20 15:59 in.5
 +
 +[hmeij@swallowtail batch]$ ./​build.jobfiles ​  
 +Usage: ./​build.jobfiles queue_name nr_of_procs
 +       Using default of q=elw, -n=1
 +       Found 5 input data files.
 +       Build 5 job files.
 +Done. Verify a job file, then submit like so '​./​jobs.submit'​
 +
 +[hmeij@swallowtail batch]$ cat job.3     
 +#!/bin/bash
 +
 +#BSUB -q elw
 +#BSUB -n 1
 +#BSUB -J job.3
 +#BSUB -o out.3
 +#BSUB -e err.3
 + 
 +MYSANSCRATCH=/​sanscratch/​$LSB_JOBID
 +MYLOCALSCRATCH=/​localscratch/​$LSB_JOBID
 +export MYSANSCRATCH MYLOCALSCRATCH
 +cd $MYSANSCRATCH
 +
 +export GAUSS_SCRDIR="​$MYLOCALSCRATCH"​
 +export g03root="/​share/​apps/​gaussian/​g03root"​
 +. $g03root/​g03/​bsd/​g03.profile
 +
 +cp /​home/​hmeij/​batch/​in.3 ./in
 +g03 < ./in > ./out
 +cp ./out /​home/​hmeij/​batch/​out.3."​$LSB_JOBID"​
 +
 +[hmeij@swallowtail batch]$ ./​jobs.submit ​
 +Job <​34529>​ is submitted to queue <​elw>​.
 +Job <​34530>​ is submitted to queue <​elw>​.
 +Job <​34531>​ is submitted to queue <​elw>​.
 +Job <​34532>​ is submitted to queue <​elw>​.
 +Job <​34533>​ is submitted to queue <​elw>​.
 +
 +[hmeij@swallowtail batch]$ bjobs
 +JOBID   ​USER ​   STAT  QUEUE      FROM_HOST ​  ​EXEC_HOST ​  ​JOB_NAME ​  ​SUBMIT_TIME
 +34529   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job.1      Dec 21 11:01
 +34530   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job.2      Dec 21 11:01
 +34531   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job.3      Dec 21 11:01
 +34532   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job.4      Dec 21 11:01
 +34533   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job.5      Dec 21 11:01
 +
 +[hmeij@swallowtail batch]$ ll
 +total 132
 +-rwxr--r-- ​ 1 hmeij its       1165 Dec 21 11:01 build.jobfiles
 +-rw-r--r-- ​ 1 hmeij its        409 Dec 20 16:36 build.template
 +-rw-r--r-- ​ 1 hmeij gaussian ​    0 Dec 21 11:02 err.1
 +-rw-r--r-- ​ 1 hmeij gaussian ​    0 Dec 21 11:02 err.2
 +-rw-r--r-- ​ 1 hmeij gaussian ​    0 Dec 21 11:02 err.3
 +-rw-r--r-- ​ 1 hmeij gaussian ​    0 Dec 21 11:02 err.4
 +-rw-r--r-- ​ 1 hmeij gaussian ​    0 Dec 21 11:02 err.5
 +-rw-r--r-- ​ 1 hmeij its        460 Dec 20 15:58 in.1
 +-rw-r--r-- ​ 1 hmeij its        460 Dec 20 15:59 in.2
 +-rw-r--r-- ​ 1 hmeij its        460 Dec 20 15:59 in.3
 +-rw-r--r-- ​ 1 hmeij its        460 Dec 20 15:59 in.4
 +-rw-r--r-- ​ 1 hmeij its        460 Dec 20 15:59 in.5
 +-rw-r--r-- ​ 1 hmeij gaussian ​  426 Dec 21 11:01 job.1
 +-rw-r--r-- ​ 1 hmeij gaussian ​  426 Dec 21 11:01 job.2
 +-rw-r--r-- ​ 1 hmeij gaussian ​  426 Dec 21 11:01 job.3
 +-rw-r--r-- ​ 1 hmeij gaussian ​  426 Dec 21 11:01 job.4
 +-rw-r--r-- ​ 1 hmeij gaussian ​  426 Dec 21 11:01 job.5
 +-rwxr--r-- ​ 1 hmeij gaussian ​   53 Dec 21 11:01 jobs.submit
 +-rw-r--r-- ​ 1 hmeij gaussian ​ 1304 Dec 21 11:02 out.1
 +-rw-r--r-- ​ 1 hmeij gaussian 11389 Dec 21 11:02 out.1.34529
 +-rw-r--r-- ​ 1 hmeij gaussian ​ 1304 Dec 21 11:02 out.2
 +-rw-r--r-- ​ 1 hmeij gaussian 11390 Dec 21 11:02 out.2.34530
 +-rw-r--r-- ​ 1 hmeij gaussian ​ 1304 Dec 21 11:02 out.3
 +-rw-r--r-- ​ 1 hmeij gaussian 11478 Dec 21 11:02 out.3.34531
 +-rw-r--r-- ​ 1 hmeij gaussian ​ 1304 Dec 21 11:02 out.4
 +-rw-r--r-- ​ 1 hmeij gaussian 11477 Dec 21 11:02 out.4.34532
 +-rw-r--r-- ​ 1 hmeij gaussian ​ 1304 Dec 21 11:02 out.5
 +-rw-r--r-- ​ 1 hmeij gaussian 11449 Dec 21 11:02 out.5.34533
 +
 +</​code>​
 +
 +
 +
 +
 +====== Job Arrays ======
 +
 +Well this turned out to be easier than expected. ​ The submission process is slightly different though, we will not be using a job file but submit the job on the command line with all arguments necessary.
 +
 +First you may wish to read 
 +
 +  * **[[cluster:​50|Simple Job Arrays]]**
 +
 +or
 +
 +  * **[[http://​lsfdocs/​lsf6.2_admin/​G_jobarrays.html#​27813|Manual Pages]]**
 +
 +When using job arrays, you submit a single job which contains many tasks. ​ Each task is a copy of the original job submission but the input and output structures vary.  Also in this case, we will **not** be using a job file with **''#​BSUB''​** commands anymore.
 +
 +Here is one way it could work using the Gaussian example mentioned above. ​ First we use the same input data files ''​in.1 - in.5''​. ​ In addition we create array files. ​ The only content in these array files is the iteration value, so for example ''​array.1''​ contains ''​1'',​ ''​array.2''​ contains ''​2'',​ etc.  This content is passed from array file as standard input to the program you specify on the command line.
 +
 +That program file, named ''​my_run.sh''​ in this example, then reads that information and uses it to set up the current job.  We then use that info to build the Gaussian invocation. ​ Seems convoluted? ​ Sure, but think about the case in which you have thousands of jobs to process. ​ This can now be done with a single job submission.
 +
 +Not clear? Here is how it works. ​ First the contents of our files:
 +
 +  * file ''​in.1''​
 +
 +<​code>​
 +%mem=1GB
 +%nproc=1
 +# hf/3-21g geom=connectivity
 +
 +Title Card Required
 +
 +0 1
 + N
 + ​H ​                 1              B1
 + ​H ​                 1              B2    2              A1
 + ​H ​                 1              B3    3              A2    2              D1
 +
 +   ​B1 ​            ​1.00000000
 +   ​B2 ​            ​1.00000000
 +   ​B3 ​            ​1.00000000
 +   ​A1 ​          ​109.47120255
 +   ​A2 ​          ​109.47125080
 +   ​D1 ​         -119.99998525
 +
 + 1 2 1.0 3 1.0 4 1.0
 + 2
 + 3
 + 4
 +</​code>​
 +
 +  * file ''​array.1''​
 + 
 +<​code>​
 +1
 +</​code>​
 +
 +  * program file ''​my_run.sh''​
 +
 +<​code>​
 +#!/bin/bash
 +
 +read i
 +echo i:$i
 +echo '​---------------------'​
 +
 +MYSANSCRATCH=/​sanscratch/​$LSB_JOBID
 +MYLOCALSCRATCH=/​localscratch/​$LSB_JOBID
 +export MYSANSCRATCH MYLOCALSCRATCH
 +
 +export GAUSS_SCRDIR="​$MYLOCALSCRATCH"​
 +export g03root="/​share/​apps/​gaussian/​g03root"​
 +. $g03root/​g03/​bsd/​g03.profile
 +
 +cp ./in.$i $MYSANSCRATCH/​in
 +cd $MYSANSCRATCH
 +# note that we capture gaussian output as standard out
 +g03 < ./in 
 +
 +</​code>​
 +
 +Here is the submission. Step by step. **NOTE THE JOBID THAT GETS ASSIGNED** ... 34554 ... it is the same for all tasks within this job.  That makes it easy to manage hundreds or thousands of jobs if you would need to for example stop them all with ''​bkill''​.
 +
 +<​code>​
 +
 +[hmeij@swallowtail arrays]$ newgrp gaussian
 +
 +[hmeij@swallowtail arrays]$ ll
 +total 44
 +-rw-r--r-- ​ 1 hmeij gaussian ​  2 Dec 21 11:20 array.1
 +-rw-r--r-- ​ 1 hmeij gaussian ​  2 Dec 21 11:20 array.2
 +-rw-r--r-- ​ 1 hmeij gaussian ​  2 Dec 21 11:20 array.3
 +-rw-r--r-- ​ 1 hmeij gaussian ​  2 Dec 21 11:20 array.4
 +-rw-r--r-- ​ 1 hmeij gaussian ​  2 Dec 21 11:20 array.5
 +-rw-r--r-- ​ 1 hmeij gaussian 460 Dec 21 11:08 in.1
 +-rw-r--r-- ​ 1 hmeij gaussian 460 Dec 21 11:08 in.2
 +-rw-r--r-- ​ 1 hmeij gaussian 460 Dec 21 11:08 in.3
 +-rw-r--r-- ​ 1 hmeij gaussian 460 Dec 21 11:08 in.4
 +-rw-r--r-- ​ 1 hmeij gaussian 460 Dec 21 11:08 in.5
 +-rwxr--r-- ​ 1 hmeij gaussian 346 Dec 21 11:35 my_run.sh
 +
 +[hmeij@swallowtail arrays]$ bsub -q elw -n 1 -J "​job[1-5]"​ -i "​array.%I"​ -o "​out.%J.%I"​ ./​my_run.sh ​
 +Job <​34554>​ is submitted to queue <​elw>​.
 +
 +[hmeij@swallowtail arrays]$ bjobs
 +JOBID   ​USER ​   STAT  QUEUE      FROM_HOST ​  ​EXEC_HOST ​  ​JOB_NAME ​  ​SUBMIT_TIME
 +34554   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job[1] ​    Dec 21 14:01
 +34554   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job[2] ​    Dec 21 14:01
 +34554   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job[3] ​    Dec 21 14:01
 +34554   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job[4] ​    Dec 21 14:01
 +34554   ​hmeij ​  ​PEND ​ elw        swallowtail ​   -        job[5] ​    Dec 21 14:01
 +
 +[hmeij@swallowtail arrays]$ ll
 +total 124
 +-rw-r--r-- ​ 1 hmeij gaussian ​    2 Dec 21 11:20 array.1
 +-rw-r--r-- ​ 1 hmeij gaussian ​    2 Dec 21 11:20 array.2
 +-rw-r--r-- ​ 1 hmeij gaussian ​    2 Dec 21 11:20 array.3
 +-rw-r--r-- ​ 1 hmeij gaussian ​    2 Dec 21 11:20 array.4
 +-rw-r--r-- ​ 1 hmeij gaussian ​    2 Dec 21 11:20 array.5
 +-rw-r--r-- ​ 1 hmeij gaussian ​  460 Dec 21 11:08 in.1
 +-rw-r--r-- ​ 1 hmeij gaussian ​  460 Dec 21 11:08 in.2
 +-rw-r--r-- ​ 1 hmeij gaussian ​  460 Dec 21 11:08 in.3
 +-rw-r--r-- ​ 1 hmeij gaussian ​  460 Dec 21 11:08 in.4
 +-rw-r--r-- ​ 1 hmeij gaussian ​  460 Dec 21 11:08 in.5
 +-rwxr--r-- ​ 1 hmeij gaussian ​  346 Dec 21 11:35 my_run.sh
 +-rw-r--r-- ​ 1 hmeij gaussian 12454 Dec 21 14:02 out.34554.1
 +-rw-r--r-- ​ 1 hmeij gaussian 12455 Dec 21 14:02 out.34554.2
 +-rw-r--r-- ​ 1 hmeij gaussian 12378 Dec 21 14:02 out.34554.3
 +-rw-r--r-- ​ 1 hmeij gaussian 12339 Dec 21 14:02 out.34554.4
 +-rw-r--r-- ​ 1 hmeij gaussian 12340 Dec 21 14:02 out.34554.5
 +
 +</​code>​
 +
 +Ofcourse you could pass more information in your array job.  For example, you could pass a tilde delimited string of many variables you need to set up your individual tasks. ​ Your program file would then read this long string and parse it apart.
 +
 +\\
 +**[[cluster:​0|Back]]**
cluster/62.txt ยท Last modified: 2007/12/21 14:08 (external edit)