Quanli walked into the office with a request: how can one automate the submission of tons of jobs? In his case Gaussian jobs. “Job Arrays” i answered confidently, but that turned out to be a bit of a problem. Still working on that.
However, we took an approach to write a script that generates the job files for you. The basic idea is build a template. Use that template and fill in dynamic data. Then submit those job files in one swoop.
I'm including the simple scripts and steps below (files are in /home/hmeij/batch). You can build the idea out based on your needs.
First you need to create the input data files. Since each file will be different (but not in my examples) you need to do this manually (or, heck, write another script for that step). So in our case we have 5 input files with the naming convention of in.1 - in.5
.
Next we build a template for the job files we want to generate. Below is our sample. The triple uppercase fields are the values we wish to dynamically fill in using our script.
build.template
#!/bin/bash # TEMPLATE #BSUB -q QQQ #BSUB -n NNN #BSUB -J JJJ #BSUB -o OOO #BSUB -e EEE MYSANSCRATCH=/sanscratch/$LSB_JOBID MYLOCALSCRATCH=/localscratch/$LSB_JOBID export MYSANSCRATCH MYLOCALSCRATCH cd $MYSANSCRATCH export GAUSS_SCRDIR="$MYLOCALSCRATCH" export g03root="/share/apps/gaussian/g03root" . $g03root/g03/bsd/g03.profile cp PPP/in.III ./in g03 < ./in > ./out cp ./out PPP/out.III."$LSB_JOBID"
Here is the simple script that will generate our job files. It basically has some default values you may override for queue name and the number of processors you need. The script looks for the number of input data files, and then generates a job file for each. Finally, the script builds a file you can use to submit those job files.
build.jobfiles
#!/usr/bin/perl if ($#ARGV == 1) { $q = $ARGV[0]; $n = $ARGV[1]; print "Usage: make sure the -n value below matches \%nprocs\n"; print "Using user defined values of q=$q, -n=$n\n"; } else { print "Usage: ./build.jobfiles queue_name nr_of_procs\n"; print " Using default of q=elw, -n=1\n"; $q = elw; $n = 1; } # load template into memory open(F,"build.template"); while (<F>) { push @T, $_; } close F; # how many input files $t = ` ls in.* | wc -l`; chop($t); print " Found $t input data files.\n"; # where are we $p = `pwd`; chop($p); # build job files for bsub foreach $i (1..$t) { undef $ss; foreach $j (2..$#T) { $s = $T[$j]; $s =~ s/QQQ/$q/g; $s =~ s/NNN/$n/g; $s =~ s/JJJ/job.$i/g; $s =~ s/OOO/out.$i/g; $s =~ s/EEE/err.$i/g; $s =~ s/III/$i/g; $s =~ s/PPP/$p/g; $ss .= $s; } open(O,">job.$i"); print O "#!/bin/bash\n$ss"; close O; } print " Build $t job files.\n"; # lazy, build a submit script open(O,">build.submit"); print O "#\!/bin/bash\nfor i in \`seq 1 $t\`\ndo\nbsub \< job.\$i\ndone\n"; close O; `chmod u+x build.submit`; print "Done. Verify a job file, then submit like so './build.submit'\n";
Here is how it works. Step by step.
[hmeij@swallowtail batch]$ newgrp gaussian [hmeij@swallowtail batch]$ ll total 28 -rwxr--r-- 1 hmeij its 1165 Dec 21 11:01 build.jobfiles -rw-r--r-- 1 hmeij its 409 Dec 20 16:36 build.template -rw-r--r-- 1 hmeij its 460 Dec 20 15:58 in.1 -rw-r--r-- 1 hmeij its 460 Dec 20 15:59 in.2 -rw-r--r-- 1 hmeij its 460 Dec 20 15:59 in.3 -rw-r--r-- 1 hmeij its 460 Dec 20 15:59 in.4 -rw-r--r-- 1 hmeij its 460 Dec 20 15:59 in.5 [hmeij@swallowtail batch]$ ./build.jobfiles Usage: ./build.jobfiles queue_name nr_of_procs Using default of q=elw, -n=1 Found 5 input data files. Build 5 job files. Done. Verify a job file, then submit like so './jobs.submit' [hmeij@swallowtail batch]$ cat job.3 #!/bin/bash #BSUB -q elw #BSUB -n 1 #BSUB -J job.3 #BSUB -o out.3 #BSUB -e err.3 MYSANSCRATCH=/sanscratch/$LSB_JOBID MYLOCALSCRATCH=/localscratch/$LSB_JOBID export MYSANSCRATCH MYLOCALSCRATCH cd $MYSANSCRATCH export GAUSS_SCRDIR="$MYLOCALSCRATCH" export g03root="/share/apps/gaussian/g03root" . $g03root/g03/bsd/g03.profile cp /home/hmeij/batch/in.3 ./in g03 < ./in > ./out cp ./out /home/hmeij/batch/out.3."$LSB_JOBID" [hmeij@swallowtail batch]$ ./jobs.submit Job <34529> is submitted to queue <elw>. Job <34530> is submitted to queue <elw>. Job <34531> is submitted to queue <elw>. Job <34532> is submitted to queue <elw>. Job <34533> is submitted to queue <elw>. [hmeij@swallowtail batch]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 34529 hmeij PEND elw swallowtail - job.1 Dec 21 11:01 34530 hmeij PEND elw swallowtail - job.2 Dec 21 11:01 34531 hmeij PEND elw swallowtail - job.3 Dec 21 11:01 34532 hmeij PEND elw swallowtail - job.4 Dec 21 11:01 34533 hmeij PEND elw swallowtail - job.5 Dec 21 11:01 [hmeij@swallowtail batch]$ ll total 132 -rwxr--r-- 1 hmeij its 1165 Dec 21 11:01 build.jobfiles -rw-r--r-- 1 hmeij its 409 Dec 20 16:36 build.template -rw-r--r-- 1 hmeij gaussian 0 Dec 21 11:02 err.1 -rw-r--r-- 1 hmeij gaussian 0 Dec 21 11:02 err.2 -rw-r--r-- 1 hmeij gaussian 0 Dec 21 11:02 err.3 -rw-r--r-- 1 hmeij gaussian 0 Dec 21 11:02 err.4 -rw-r--r-- 1 hmeij gaussian 0 Dec 21 11:02 err.5 -rw-r--r-- 1 hmeij its 460 Dec 20 15:58 in.1 -rw-r--r-- 1 hmeij its 460 Dec 20 15:59 in.2 -rw-r--r-- 1 hmeij its 460 Dec 20 15:59 in.3 -rw-r--r-- 1 hmeij its 460 Dec 20 15:59 in.4 -rw-r--r-- 1 hmeij its 460 Dec 20 15:59 in.5 -rw-r--r-- 1 hmeij gaussian 426 Dec 21 11:01 job.1 -rw-r--r-- 1 hmeij gaussian 426 Dec 21 11:01 job.2 -rw-r--r-- 1 hmeij gaussian 426 Dec 21 11:01 job.3 -rw-r--r-- 1 hmeij gaussian 426 Dec 21 11:01 job.4 -rw-r--r-- 1 hmeij gaussian 426 Dec 21 11:01 job.5 -rwxr--r-- 1 hmeij gaussian 53 Dec 21 11:01 jobs.submit -rw-r--r-- 1 hmeij gaussian 1304 Dec 21 11:02 out.1 -rw-r--r-- 1 hmeij gaussian 11389 Dec 21 11:02 out.1.34529 -rw-r--r-- 1 hmeij gaussian 1304 Dec 21 11:02 out.2 -rw-r--r-- 1 hmeij gaussian 11390 Dec 21 11:02 out.2.34530 -rw-r--r-- 1 hmeij gaussian 1304 Dec 21 11:02 out.3 -rw-r--r-- 1 hmeij gaussian 11478 Dec 21 11:02 out.3.34531 -rw-r--r-- 1 hmeij gaussian 1304 Dec 21 11:02 out.4 -rw-r--r-- 1 hmeij gaussian 11477 Dec 21 11:02 out.4.34532 -rw-r--r-- 1 hmeij gaussian 1304 Dec 21 11:02 out.5 -rw-r--r-- 1 hmeij gaussian 11449 Dec 21 11:02 out.5.34533
Well this turned out to be easier than expected. The submission process is slightly different though, we will not be using a job file but submit the job on the command line with all arguments necessary.
First you may wish to read
or
When using job arrays, you submit a single job which contains many tasks. Each task is a copy of the original job submission but the input and output structures vary. Also in this case, we will not be using a job file with #BSUB
commands anymore.
Here is one way it could work using the Gaussian example mentioned above. First we use the same input data files in.1 - in.5
. In addition we create array files. The only content in these array files is the iteration value, so for example array.1
contains 1
, array.2
contains 2
, etc. This content is passed from array file as standard input to the program you specify on the command line.
That program file, named my_run.sh
in this example, then reads that information and uses it to set up the current job. We then use that info to build the Gaussian invocation. Seems convoluted? Sure, but think about the case in which you have thousands of jobs to process. This can now be done with a single job submission.
Not clear? Here is how it works. First the contents of our files:
in.1
%mem=1GB %nproc=1 # hf/3-21g geom=connectivity Title Card Required 0 1 N H 1 B1 H 1 B2 2 A1 H 1 B3 3 A2 2 D1 B1 1.00000000 B2 1.00000000 B3 1.00000000 A1 109.47120255 A2 109.47125080 D1 -119.99998525 1 2 1.0 3 1.0 4 1.0 2 3 4
array.1
1
my_run.sh
#!/bin/bash read i echo i:$i echo '---------------------' MYSANSCRATCH=/sanscratch/$LSB_JOBID MYLOCALSCRATCH=/localscratch/$LSB_JOBID export MYSANSCRATCH MYLOCALSCRATCH export GAUSS_SCRDIR="$MYLOCALSCRATCH" export g03root="/share/apps/gaussian/g03root" . $g03root/g03/bsd/g03.profile cp ./in.$i $MYSANSCRATCH/in cd $MYSANSCRATCH # note that we capture gaussian output as standard out g03 < ./in
Here is the submission. Step by step. NOTE THE JOBID THAT GETS ASSIGNED … 34554 … it is the same for all tasks within this job. That makes it easy to manage hundreds or thousands of jobs if you would need to for example stop them all with bkill
.
[hmeij@swallowtail arrays]$ newgrp gaussian [hmeij@swallowtail arrays]$ ll total 44 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.1 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.2 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.3 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.4 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.5 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.1 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.2 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.3 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.4 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.5 -rwxr--r-- 1 hmeij gaussian 346 Dec 21 11:35 my_run.sh [hmeij@swallowtail arrays]$ bsub -q elw -n 1 -J "job[1-5]" -i "array.%I" -o "out.%J.%I" ./my_run.sh Job <34554> is submitted to queue <elw>. [hmeij@swallowtail arrays]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 34554 hmeij PEND elw swallowtail - job[1] Dec 21 14:01 34554 hmeij PEND elw swallowtail - job[2] Dec 21 14:01 34554 hmeij PEND elw swallowtail - job[3] Dec 21 14:01 34554 hmeij PEND elw swallowtail - job[4] Dec 21 14:01 34554 hmeij PEND elw swallowtail - job[5] Dec 21 14:01 [hmeij@swallowtail arrays]$ ll total 124 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.1 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.2 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.3 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.4 -rw-r--r-- 1 hmeij gaussian 2 Dec 21 11:20 array.5 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.1 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.2 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.3 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.4 -rw-r--r-- 1 hmeij gaussian 460 Dec 21 11:08 in.5 -rwxr--r-- 1 hmeij gaussian 346 Dec 21 11:35 my_run.sh -rw-r--r-- 1 hmeij gaussian 12454 Dec 21 14:02 out.34554.1 -rw-r--r-- 1 hmeij gaussian 12455 Dec 21 14:02 out.34554.2 -rw-r--r-- 1 hmeij gaussian 12378 Dec 21 14:02 out.34554.3 -rw-r--r-- 1 hmeij gaussian 12339 Dec 21 14:02 out.34554.4 -rw-r--r-- 1 hmeij gaussian 12340 Dec 21 14:02 out.34554.5
Ofcourse you could pass more information in your array job. For example, you could pass a tilde delimited string of many variables you need to set up your individual tasks. Your program file would then read this long string and parse it apart.