The new installation of LSF supports an integrated environment for submitting parallel jobs. What this means is that the scheduler can keep track of the resource consumption of a job spawning many parallel tasks. Lava was unable to do so.
Small changes are needed to your job script. We will first implement a Generic Parallel Job Launcher Framework. The two MPI flavors i will fully integrated at a later date will be OpenMPI and Topspin.
Below is a sequence of steps detailing how it all works. For a quick synopsis, here is what you need to change to start using this new Framework right away:
/share/apps/bin/lsf.topspin.wrapper
/share/apps/bin/lsf.openmpi.wrapper
/share/apps/bin/lsf.openmpi_intel.wrapper
-n
option anymore when invoking your application, the BSUB line is enough#BSUB -n 4
First, lets compile a tiny little C program. cpi.c
is a program that calculates π, Pi. It first fires off a bunch of parallel tasks, each calculates Pi. Worker #0 reports its own calculation results back to standard out.
Flavor | MPI Compiler | Arguments |
---|---|---|
mvapich | /share/apps/mvapich-0.9.9/bin/mpicc | -o cpi_mvapich cpi.c |
openmpi | /share/apps/openmpi-1.2/bin/mpicc | -o cpi_openmpi cpi.c |
openmpi_intel | /share/apps/openmpi-1.2_intel/bin/mpicc | -o cpi_openmpi_intel cpi.c |
topspin | /usr/local/topspin/mpi/mpich/bin/mpicc | -o cpi_topspin cpi.c |
The surprise here is that we end up with binaries ranging in size of 10 Kb to 3 MB. Topspin is the MPI flavor that came with our cluster for the Infiniband switch. MVApich and OpenMPI were downloaded and the source compiled with gcc
. The alternate OpenMPI (openmpi_intel) was compiled with Intel's compilers. Topspin can only run across the Infiniband switch but both OpenMPI flavors can use either switch.
lrwxrwxrwx 1 hmeij its 33 Jan 3 15:49 cpi.c -> /share/apps/openmpi-1.2/bin/cpi.c -rwxr-xr-x 1 hmeij its 406080 Jan 7 14:38 cpi_mvapich -rwxr-xr-x 1 hmeij its 10166 Jan 8 15:36 cpi_openmpi -rwxr-xr-x 1 hmeij its 3023929 Jan 3 16:32 cpi_openmpi_intel -rwxr-xr-x 1 hmeij its 9781 Jan 3 16:25 cpi_topspin
Here is the test script we'll use for testing. Note the lack of the -n
option on the line invoking our application. We will ask for 4 parallel tasks with 2 tasks per node.
#!/bin/bash rm -f ./err ./out #BSUB -q imw #BSUB -n 4 #BSUB -R "span[ptile=2]" #BSUB -J mpi.lsf #BSUB -e err #BSUB -o out # WRAPPERS echo topsin time /share/apps/bin/lsf.topspin.wrapper ./cpi_topspin echo openmpi time /share/apps/bin/lsf.openmpi.wrapper ./cpi_openmpi echo openmpi_intel time /share/apps/bin/lsf.openmpi_intel.wrapper ./cpi_openmpi_intel
Process 1 on compute-1-3.local Process 0 on compute-1-3.local Process 3 on compute-1-13.local Process 2 on compute-1-13.local real 0m6.837s user 0m0.032s sys 0m0.086s Process 1 on compute-1-3.local Process 2 on compute-1-13.local Process 3 on compute-1-13.local Process 0 on compute-1-3.local real 0m2.071s user 0m0.018s sys 0m0.035s Process 0 on compute-1-3.local Process 1 on compute-1-3.local Process 2 on compute-1-13.local Process 3 on compute-1-13.local real 0m1.489s user 0m0.014s sys 0m0.035s
The output (if any) follows: topsin pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.000184 openmpi pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.010902 openmpi_intel pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.021757
emw
-------------------------------------------------------------------------- [0,1,0]: MVAPI on host compute-1-23 was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,1]: MVAPI on host compute-1-23 was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,2]: MVAPI on host compute-1-18 was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,3]: MVAPI on host compute-1-18 was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- Process 0 on compute-1-23.local Process 2 on compute-1-18.local Process 3 on compute-1-18.local Process 1 on compute-1-23.local real 0m1.344s user 0m0.014s sys 0m0.022s
The output (if any) follows: openmpi_intel pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.013772
To test Amber with the new scripts, we rerun the programs from these test results. Note that the memory footprint has changed of the nodes involved. Lets tally up some runs and note the run time. We'll run each twice
Amber | MPI Flavor | Switch | NProcs | JAC bench 1 | JAC bench 2 | Factor_IX bench 1 | Factor_IX bench 2 |
---|---|---|---|---|---|---|---|
9 | topspin | infiniband | 4 | 01m38s | 01m57s | 02m45s | 02m38s |
9openmpi | openmpi_intel | infiniband | 4 | 01m30s | 01m35s | 02m34s | 02m035s |
9openmpi | openmpi_intel | ethernet | 4 | 02m15s | 02m06s | 03m32s | 03m47s |