User Tools

Site Tools


cluster:62


Back

Automated Submissions

Quanli walked into the office with a request: how can one automate the submission of tons of jobs? In his case Gaussian jobs. “Job Arrays” i answered confidently, but that turned out to be a bit of a problem. Still working on that.

However, we took an approach to write a script that generates the job files for you. The basic idea is build a template. Use that template and fill in dynamic data. Then submit those job files in one swoop.

I'm including the simple scripts and steps below (files are in /home/hmeij/batch). You can build the idea out based on your needs.

Script Approach

First you need to create the input data files. Since each file will be different (but not in my examples) you need to do this manually (or, heck, write another script for that step). So in our case we have 5 input files with the naming convention of in.1 - in.5.

Next we build a template for the job files we want to generate. Below is our sample. The triple uppercase fields are the values we wish to dynamically fill in using our script.

  • file build.template
#!/bin/bash
# TEMPLATE

#BSUB -q QQQ
#BSUB -n NNN
#BSUB -J JJJ
#BSUB -o OOO
#BSUB -e EEE
 
MYSANSCRATCH=/sanscratch/$LSB_JOBID
MYLOCALSCRATCH=/localscratch/$LSB_JOBID
export MYSANSCRATCH MYLOCALSCRATCH
cd $MYSANSCRATCH

export GAUSS_SCRDIR="$MYLOCALSCRATCH"
export g03root="/share/apps/gaussian/g03root"
. $g03root/g03/bsd/g03.profile

cp PPP/in.III ./in
g03 < ./in > ./out
cp ./out PPP/out.III."$LSB_JOBID"

Here is the simple script that will generate our job files. It basically has some default values you may override for queue name and the number of processors you need. The script looks for the number of input data files, and then generates a job file for each. Finally, the script builds a file you can use to submit those job files.

  • file build.jobfiles
#!/usr/bin/perl

if ($#ARGV == 1) {
        $q = $ARGV[0];
        $n = $ARGV[1];
        print "Usage: make sure the -n value below matches \%nprocs\n";
        print "Using user defined values of q=$q, -n=$n\n";
} else {
        print "Usage: ./build.jobfiles queue_name nr_of_procs\n";
        print "       Using default of q=elw, -n=1\n";
        $q = elw;
        $n = 1;
}


# load template into memory
open(F,"build.template");
while (<F>) {
        push @T, $_;
}
close F;

# how many input files
$t = ` ls in.* | wc -l`;
chop($t);
print "       Found $t input data files.\n";

# where are we
$p = `pwd`;
chop($p);

# build job files for bsub
foreach $i (1..$t) {
        undef $ss;
        foreach $j (2..$#T) {
                $s = $T[$j];
                $s =~ s/QQQ/$q/g;
                $s =~ s/NNN/$n/g;
                $s =~ s/JJJ/job.$i/g;
                $s =~ s/OOO/out.$i/g;
                $s =~ s/EEE/err.$i/g;
                $s =~ s/III/$i/g;
                $s =~ s/PPP/$p/g;
                $ss .= $s;
        }
        open(O,">job.$i");
        print O "#!/bin/bash\n$ss";
        close O;
}
print "       Build $t job files.\n";

# lazy, build a submit script
open(O,">build.submit");
print O "#\!/bin/bash\nfor i in \`seq 1 $t\`\ndo\nbsub \< job.\$i\ndone\n";
close O;
`chmod u+x build.submit`;

print "Done. Verify a job file, then submit like so './build.submit'\n";

Here is how it works. Step by step.

[hmeij@swallowtail batch]$ newgrp gaussian

[hmeij@swallowtail batch]$ ll
total 28
-rwxr--r--  1 hmeij its 1165 Dec 21 11:01 build.jobfiles
-rw-r--r--  1 hmeij its  409 Dec 20 16:36 build.template
-rw-r--r--  1 hmeij its  460 Dec 20 15:58 in.1
-rw-r--r--  1 hmeij its  460 Dec 20 15:59 in.2
-rw-r--r--  1 hmeij its  460 Dec 20 15:59 in.3
-rw-r--r--  1 hmeij its  460 Dec 20 15:59 in.4
-rw-r--r--  1 hmeij its  460 Dec 20 15:59 in.5

[hmeij@swallowtail batch]$ ./build.jobfiles   
Usage: ./build.jobfiles queue_name nr_of_procs
       Using default of q=elw, -n=1
       Found 5 input data files.
       Build 5 job files.
Done. Verify a job file, then submit like so './jobs.submit'

[hmeij@swallowtail batch]$ cat job.3     
#!/bin/bash

#BSUB -q elw
#BSUB -n 1
#BSUB -J job.3
#BSUB -o out.3
#BSUB -e err.3
 
MYSANSCRATCH=/sanscratch/$LSB_JOBID
MYLOCALSCRATCH=/localscratch/$LSB_JOBID
export MYSANSCRATCH MYLOCALSCRATCH
cd $MYSANSCRATCH

export GAUSS_SCRDIR="$MYLOCALSCRATCH"
export g03root="/share/apps/gaussian/g03root"
. $g03root/g03/bsd/g03.profile

cp /home/hmeij/batch/in.3 ./in
g03 < ./in > ./out
cp ./out /home/hmeij/batch/out.3."$LSB_JOBID"

[hmeij@swallowtail batch]$ ./jobs.submit 
Job <34529> is submitted to queue <elw>.
Job <34530> is submitted to queue <elw>.
Job <34531> is submitted to queue <elw>.
Job <34532> is submitted to queue <elw>.
Job <34533> is submitted to queue <elw>.

[hmeij@swallowtail batch]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
34529   hmeij   PEND  elw        swallowtail    -        job.1      Dec 21 11:01
34530   hmeij   PEND  elw        swallowtail    -        job.2      Dec 21 11:01
34531   hmeij   PEND  elw        swallowtail    -        job.3      Dec 21 11:01
34532   hmeij   PEND  elw        swallowtail    -        job.4      Dec 21 11:01
34533   hmeij   PEND  elw        swallowtail    -        job.5      Dec 21 11:01

[hmeij@swallowtail batch]$ ll
total 132
-rwxr--r--  1 hmeij its       1165 Dec 21 11:01 build.jobfiles
-rw-r--r--  1 hmeij its        409 Dec 20 16:36 build.template
-rw-r--r--  1 hmeij gaussian     0 Dec 21 11:02 err.1
-rw-r--r--  1 hmeij gaussian     0 Dec 21 11:02 err.2
-rw-r--r--  1 hmeij gaussian     0 Dec 21 11:02 err.3
-rw-r--r--  1 hmeij gaussian     0 Dec 21 11:02 err.4
-rw-r--r--  1 hmeij gaussian     0 Dec 21 11:02 err.5
-rw-r--r--  1 hmeij its        460 Dec 20 15:58 in.1
-rw-r--r--  1 hmeij its        460 Dec 20 15:59 in.2
-rw-r--r--  1 hmeij its        460 Dec 20 15:59 in.3
-rw-r--r--  1 hmeij its        460 Dec 20 15:59 in.4
-rw-r--r--  1 hmeij its        460 Dec 20 15:59 in.5
-rw-r--r--  1 hmeij gaussian   426 Dec 21 11:01 job.1
-rw-r--r--  1 hmeij gaussian   426 Dec 21 11:01 job.2
-rw-r--r--  1 hmeij gaussian   426 Dec 21 11:01 job.3
-rw-r--r--  1 hmeij gaussian   426 Dec 21 11:01 job.4
-rw-r--r--  1 hmeij gaussian   426 Dec 21 11:01 job.5
-rwxr--r--  1 hmeij gaussian    53 Dec 21 11:01 jobs.submit
-rw-r--r--  1 hmeij gaussian  1304 Dec 21 11:02 out.1
-rw-r--r--  1 hmeij gaussian 11389 Dec 21 11:02 out.1.34529
-rw-r--r--  1 hmeij gaussian  1304 Dec 21 11:02 out.2
-rw-r--r--  1 hmeij gaussian 11390 Dec 21 11:02 out.2.34530
-rw-r--r--  1 hmeij gaussian  1304 Dec 21 11:02 out.3
-rw-r--r--  1 hmeij gaussian 11478 Dec 21 11:02 out.3.34531
-rw-r--r--  1 hmeij gaussian  1304 Dec 21 11:02 out.4
-rw-r--r--  1 hmeij gaussian 11477 Dec 21 11:02 out.4.34532
-rw-r--r--  1 hmeij gaussian  1304 Dec 21 11:02 out.5
-rw-r--r--  1 hmeij gaussian 11449 Dec 21 11:02 out.5.34533

Job Arrays

Well this turned out to be easier than expected. The submission process is slightly different though, we will not be using a job file but submit the job on the command line with all arguments necessary.

First you may wish to read

or

When using job arrays, you submit a single job which contains many tasks. Each task is a copy of the original job submission but the input and output structures vary. Also in this case, we will not be using a job file with #BSUB commands anymore.

Here is one way it could work using the Gaussian example mentioned above. First we use the same input data files in.1 - in.5. In addition we create array files. The only content in these array files is the iteration value, so for example array.1 contains 1, array.2 contains 2, etc. This content is passed from array file as standard input to the program you specify on the command line.

That program file, named my_run.sh in this example, then reads that information and uses it to set up the current job. We then use that info to build the Gaussian invocation. Seems convoluted? Sure, but think about the case in which you have thousands of jobs to process. This can now be done with a single job submission.

Not clear? Here is how it works. First the contents of our files:

  • file in.1
%mem=1GB
%nproc=1
# hf/3-21g geom=connectivity

Title Card Required

0 1
 N
 H                  1              B1
 H                  1              B2    2              A1
 H                  1              B3    3              A2    2              D1

   B1             1.00000000
   B2             1.00000000
   B3             1.00000000
   A1           109.47120255
   A2           109.47125080
   D1          -119.99998525

 1 2 1.0 3 1.0 4 1.0
 2
 3
 4
  • file array.1
1
  • program file my_run.sh
#!/bin/bash

read i
echo i:$i
echo '---------------------'

MYSANSCRATCH=/sanscratch/$LSB_JOBID
MYLOCALSCRATCH=/localscratch/$LSB_JOBID
export MYSANSCRATCH MYLOCALSCRATCH

export GAUSS_SCRDIR="$MYLOCALSCRATCH"
export g03root="/share/apps/gaussian/g03root"
. $g03root/g03/bsd/g03.profile

cp ./in.$i $MYSANSCRATCH/in
cd $MYSANSCRATCH
# note that we capture gaussian output as standard out
g03 < ./in 

Here is the submission. Step by step. NOTE THE JOBID THAT GETS ASSIGNED … 34554 … it is the same for all tasks within this job. That makes it easy to manage hundreds or thousands of jobs if you would need to for example stop them all with bkill.

[hmeij@swallowtail arrays]$ newgrp gaussian

[hmeij@swallowtail arrays]$ ll
total 44
-rw-r--r--  1 hmeij gaussian   2 Dec 21 11:20 array.1
-rw-r--r--  1 hmeij gaussian   2 Dec 21 11:20 array.2
-rw-r--r--  1 hmeij gaussian   2 Dec 21 11:20 array.3
-rw-r--r--  1 hmeij gaussian   2 Dec 21 11:20 array.4
-rw-r--r--  1 hmeij gaussian   2 Dec 21 11:20 array.5
-rw-r--r--  1 hmeij gaussian 460 Dec 21 11:08 in.1
-rw-r--r--  1 hmeij gaussian 460 Dec 21 11:08 in.2
-rw-r--r--  1 hmeij gaussian 460 Dec 21 11:08 in.3
-rw-r--r--  1 hmeij gaussian 460 Dec 21 11:08 in.4
-rw-r--r--  1 hmeij gaussian 460 Dec 21 11:08 in.5
-rwxr--r--  1 hmeij gaussian 346 Dec 21 11:35 my_run.sh

[hmeij@swallowtail arrays]$ bsub -q elw -n 1 -J "job[1-5]" -i "array.%I" -o "out.%J.%I" ./my_run.sh 
Job <34554> is submitted to queue <elw>.

[hmeij@swallowtail arrays]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
34554   hmeij   PEND  elw        swallowtail    -        job[1]     Dec 21 14:01
34554   hmeij   PEND  elw        swallowtail    -        job[2]     Dec 21 14:01
34554   hmeij   PEND  elw        swallowtail    -        job[3]     Dec 21 14:01
34554   hmeij   PEND  elw        swallowtail    -        job[4]     Dec 21 14:01
34554   hmeij   PEND  elw        swallowtail    -        job[5]     Dec 21 14:01

[hmeij@swallowtail arrays]$ ll
total 124
-rw-r--r--  1 hmeij gaussian     2 Dec 21 11:20 array.1
-rw-r--r--  1 hmeij gaussian     2 Dec 21 11:20 array.2
-rw-r--r--  1 hmeij gaussian     2 Dec 21 11:20 array.3
-rw-r--r--  1 hmeij gaussian     2 Dec 21 11:20 array.4
-rw-r--r--  1 hmeij gaussian     2 Dec 21 11:20 array.5
-rw-r--r--  1 hmeij gaussian   460 Dec 21 11:08 in.1
-rw-r--r--  1 hmeij gaussian   460 Dec 21 11:08 in.2
-rw-r--r--  1 hmeij gaussian   460 Dec 21 11:08 in.3
-rw-r--r--  1 hmeij gaussian   460 Dec 21 11:08 in.4
-rw-r--r--  1 hmeij gaussian   460 Dec 21 11:08 in.5
-rwxr--r--  1 hmeij gaussian   346 Dec 21 11:35 my_run.sh
-rw-r--r--  1 hmeij gaussian 12454 Dec 21 14:02 out.34554.1
-rw-r--r--  1 hmeij gaussian 12455 Dec 21 14:02 out.34554.2
-rw-r--r--  1 hmeij gaussian 12378 Dec 21 14:02 out.34554.3
-rw-r--r--  1 hmeij gaussian 12339 Dec 21 14:02 out.34554.4
-rw-r--r--  1 hmeij gaussian 12340 Dec 21 14:02 out.34554.5

Ofcourse you could pass more information in your array job. For example, you could pass a tilde delimited string of many variables you need to set up your individual tasks. Your program file would then read this long string and parse it apart.


Back

cluster/62.txt · Last modified: 2007/12/21 14:08 (external edit)