\\
**[[cluster:0|Home]]**
===== LSF & MPI =====
The new installation of LSF supports an integrated environment for submitting parallel jobs. What this means is that the scheduler can keep track of the resource consumption of a job spawning many parallel tasks. Lava was unable to do so.
Small changes are needed to your job script. We will first implement a **[[http://lsfdocs.wesleyan.edu/hpc6.2_using/parallel_jobs.html#196824|Generic Parallel Job Launcher Framework]]**. The two MPI flavors i will fully integrated at a later date will be OpenMPI and Topspin.
Below is a sequence of steps detailing how it all works. For a quick synopsis, here is what you need to change to start using this new Framework right away:
- change the references to the old lava wrapper scripts to the new lsf wrapper scripts:
* ''/share/apps/bin/lsf.topspin.wrapper''
* ''/share/apps/bin/lsf.openmpi.wrapper''
* ''/share/apps/bin/lsf.openmpi_intel.wrapper''
- you do not have to specify the ''-n'' option anymore when invoking your application, the BSUB line is enough
* ''#BSUB -n 4''
==== cpi ====
First, lets compile a tiny little C program. ''cpi.c'' is a program that calculates **[[http://en.wikipedia.org/wiki/Pi|π]], Pi**. It first fires off a bunch of parallel tasks, each calculates Pi. Worker #0 reports its own calculation results back to standard out.
^Flavor^MPI Compiler^Arguments^
| mvapich |/share/apps/mvapich-0.9.9/bin/mpicc | -o cpi_mvapich cpi.c |
| openmpi |/share/apps/openmpi-1.2/bin/mpicc | -o cpi_openmpi cpi.c |
| openmpi_intel |/share/apps/openmpi-1.2_intel/bin/mpicc | -o cpi_openmpi_intel cpi.c |
| topspin |/usr/local/topspin/mpi/mpich/bin/mpicc | -o cpi_topspin cpi.c |
The surprise here is that we end up with binaries ranging in size of 10 Kb to 3 MB. Topspin is the MPI flavor that came with our cluster for the Infiniband switch. MVApich and OpenMPI were downloaded and the source compiled with ''gcc''. The alternate OpenMPI (openmpi_intel) was compiled with Intel's compilers. Topspin can only run across the Infiniband switch but both OpenMPI flavors can use either switch.
lrwxrwxrwx 1 hmeij its 33 Jan 3 15:49 cpi.c -> /share/apps/openmpi-1.2/bin/cpi.c
-rwxr-xr-x 1 hmeij its 406080 Jan 7 14:38 cpi_mvapich
-rwxr-xr-x 1 hmeij its 10166 Jan 8 15:36 cpi_openmpi
-rwxr-xr-x 1 hmeij its 3023929 Jan 3 16:32 cpi_openmpi_intel
-rwxr-xr-x 1 hmeij its 9781 Jan 3 16:25 cpi_topspin
==== Job Script ====
Here is the test script we'll use for testing. Note the lack of the ''-n'' option on the line invoking our application. We will ask for 4 parallel tasks with 2 tasks per node.
#!/bin/bash
rm -f ./err ./out
#BSUB -q imw
#BSUB -n 4
#BSUB -R "span[ptile=2]"
#BSUB -J mpi.lsf
#BSUB -e err
#BSUB -o out
# WRAPPERS
echo topsin
time /share/apps/bin/lsf.topspin.wrapper ./cpi_topspin
echo openmpi
time /share/apps/bin/lsf.openmpi.wrapper ./cpi_openmpi
echo openmpi_intel
time /share/apps/bin/lsf.openmpi_intel.wrapper ./cpi_openmpi_intel
==== Infiniband ====
* queue: imw
* all 3 MPI flavors
* err
Process 1 on compute-1-3.local
Process 0 on compute-1-3.local
Process 3 on compute-1-13.local
Process 2 on compute-1-13.local
real 0m6.837s
user 0m0.032s
sys 0m0.086s
Process 1 on compute-1-3.local
Process 2 on compute-1-13.local
Process 3 on compute-1-13.local
Process 0 on compute-1-3.local
real 0m2.071s
user 0m0.018s
sys 0m0.035s
Process 0 on compute-1-3.local
Process 1 on compute-1-3.local
Process 2 on compute-1-13.local
Process 3 on compute-1-13.local
real 0m1.489s
user 0m0.014s
sys 0m0.035s
* out
The output (if any) follows:
topsin
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.000184
openmpi
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.010902
openmpi_intel
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.021757
==== Ethernet ====
* queue ''emw''
* MPI: openmpi_intel only.
* err
--------------------------------------------------------------------------
[0,1,0]: MVAPI on host compute-1-23 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,1]: MVAPI on host compute-1-23 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,2]: MVAPI on host compute-1-18 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,3]: MVAPI on host compute-1-18 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Process 0 on compute-1-23.local
Process 2 on compute-1-18.local
Process 3 on compute-1-18.local
Process 1 on compute-1-23.local
real 0m1.344s
user 0m0.014s
sys 0m0.022s
* out
The output (if any) follows:
openmpi_intel
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.013772
==== Amber ====
To test Amber with the new scripts, we rerun the programs from [[cluster:42#results|these test results]]. Note that the memory footprint has changed of the nodes involved. Lets tally up some runs and note the run time. We'll run each twice
^Amber^MPI Flavor^Switch^NProcs^JAC bench 1^JAC bench 2^Factor_IX bench 1^Factor_IX bench 2^
|9 |topspin |infiniband | 4 | 01m38s | 01m57s | 02m45s | 02m38s |
|9openmpi |openmpi_intel |infiniband | 4 | 01m30s | 01m35s | 02m34s | 02m035s |
|9openmpi |openmpi_intel |ethernet | 4 | 02m15s | 02m06s | 03m32s | 03m47s |
\\
**[[cluster:0|Home]]**