Table of Contents


Home

LSF & MPI

The new installation of LSF supports an integrated environment for submitting parallel jobs. What this means is that the scheduler can keep track of the resource consumption of a job spawning many parallel tasks. Lava was unable to do so.

Small changes are needed to your job script. We will first implement a Generic Parallel Job Launcher Framework. The two MPI flavors i will fully integrated at a later date will be OpenMPI and Topspin.

Below is a sequence of steps detailing how it all works. For a quick synopsis, here is what you need to change to start using this new Framework right away:

  1. change the references to the old lava wrapper scripts to the new lsf wrapper scripts:
    • /share/apps/bin/lsf.topspin.wrapper
    • /share/apps/bin/lsf.openmpi.wrapper
    • /share/apps/bin/lsf.openmpi_intel.wrapper
  2. you do not have to specify the -n option anymore when invoking your application, the BSUB line is enough
    • #BSUB -n 4

cpi

First, lets compile a tiny little C program. cpi.c is a program that calculates π, Pi. It first fires off a bunch of parallel tasks, each calculates Pi. Worker #0 reports its own calculation results back to standard out.

FlavorMPI CompilerArguments
mvapich /share/apps/mvapich-0.9.9/bin/mpicc -o cpi_mvapich cpi.c
openmpi /share/apps/openmpi-1.2/bin/mpicc -o cpi_openmpi cpi.c
openmpi_intel /share/apps/openmpi-1.2_intel/bin/mpicc -o cpi_openmpi_intel cpi.c
topspin /usr/local/topspin/mpi/mpich/bin/mpicc -o cpi_topspin cpi.c

The surprise here is that we end up with binaries ranging in size of 10 Kb to 3 MB. Topspin is the MPI flavor that came with our cluster for the Infiniband switch. MVApich and OpenMPI were downloaded and the source compiled with gcc. The alternate OpenMPI (openmpi_intel) was compiled with Intel's compilers. Topspin can only run across the Infiniband switch but both OpenMPI flavors can use either switch.

lrwxrwxrwx  1 hmeij its      33 Jan  3 15:49 cpi.c -> /share/apps/openmpi-1.2/bin/cpi.c
-rwxr-xr-x  1 hmeij its  406080 Jan  7 14:38 cpi_mvapich
-rwxr-xr-x  1 hmeij its   10166 Jan  8 15:36 cpi_openmpi
-rwxr-xr-x  1 hmeij its 3023929 Jan  3 16:32 cpi_openmpi_intel
-rwxr-xr-x  1 hmeij its    9781 Jan  3 16:25 cpi_topspin

Job Script

Here is the test script we'll use for testing. Note the lack of the -n option on the line invoking our application. We will ask for 4 parallel tasks with 2 tasks per node.

#!/bin/bash
rm -f ./err ./out

#BSUB -q imw
#BSUB -n 4 
#BSUB -R "span[ptile=2]"
#BSUB -J mpi.lsf
#BSUB -e err
#BSUB -o out 

# WRAPPERS

echo topsin
time /share/apps/bin/lsf.topspin.wrapper ./cpi_topspin

echo openmpi
time /share/apps/bin/lsf.openmpi.wrapper ./cpi_openmpi
echo openmpi_intel
time /share/apps/bin/lsf.openmpi_intel.wrapper ./cpi_openmpi_intel

Infiniband

Process 1 on compute-1-3.local
Process 0 on compute-1-3.local
Process 3 on compute-1-13.local
Process 2 on compute-1-13.local

real    0m6.837s
user    0m0.032s
sys     0m0.086s
Process 1 on compute-1-3.local
Process 2 on compute-1-13.local
Process 3 on compute-1-13.local
Process 0 on compute-1-3.local

real    0m2.071s
user    0m0.018s
sys     0m0.035s
Process 0 on compute-1-3.local
Process 1 on compute-1-3.local
Process 2 on compute-1-13.local
Process 3 on compute-1-13.local

real    0m1.489s
user    0m0.014s
sys     0m0.035s
The output (if any) follows:

topsin
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.000184
openmpi
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.010902
openmpi_intel
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.021757

Ethernet

--------------------------------------------------------------------------
[0,1,0]: MVAPI on host compute-1-23 was unable to find any HCAs.
Another transport will be used instead, although this may result in 
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,1]: MVAPI on host compute-1-23 was unable to find any HCAs.
Another transport will be used instead, although this may result in 
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,2]: MVAPI on host compute-1-18 was unable to find any HCAs.
Another transport will be used instead, although this may result in 
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,3]: MVAPI on host compute-1-18 was unable to find any HCAs.
Another transport will be used instead, although this may result in 
lower performance.
--------------------------------------------------------------------------
Process 0 on compute-1-23.local
Process 2 on compute-1-18.local
Process 3 on compute-1-18.local
Process 1 on compute-1-23.local

real    0m1.344s
user    0m0.014s
sys     0m0.022s
The output (if any) follows:

openmpi_intel
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.013772

Amber

To test Amber with the new scripts, we rerun the programs from these test results. Note that the memory footprint has changed of the nodes involved. Lets tally up some runs and note the run time. We'll run each twice

AmberMPI FlavorSwitchNProcsJAC bench 1JAC bench 2Factor_IX bench 1Factor_IX bench 2
9 topspin infiniband 4 01m38s 01m57s 02m45s 02m38s
9openmpi openmpi_intel infiniband 4 01m30s 01m35s 02m34s 02m035s
9openmpi openmpi_intel ethernet 4 02m15s 02m06s 03m32s 03m47s


Home