\\ **[[cluster:0|Home]]** ===== LSF & MPI ===== The new installation of LSF supports an integrated environment for submitting parallel jobs. What this means is that the scheduler can keep track of the resource consumption of a job spawning many parallel tasks. Lava was unable to do so. Small changes are needed to your job script. We will first implement a **[[http://lsfdocs.wesleyan.edu/hpc6.2_using/parallel_jobs.html#196824|Generic Parallel Job Launcher Framework]]**. The two MPI flavors i will fully integrated at a later date will be OpenMPI and Topspin. Below is a sequence of steps detailing how it all works. For a quick synopsis, here is what you need to change to start using this new Framework right away: - change the references to the old lava wrapper scripts to the new lsf wrapper scripts: * ''/share/apps/bin/lsf.topspin.wrapper'' * ''/share/apps/bin/lsf.openmpi.wrapper'' * ''/share/apps/bin/lsf.openmpi_intel.wrapper'' - you do not have to specify the ''-n'' option anymore when invoking your application, the BSUB line is enough * ''#BSUB -n 4'' ==== cpi ==== First, lets compile a tiny little C program. ''cpi.c'' is a program that calculates **[[http://en.wikipedia.org/wiki/Pi|π]], Pi**. It first fires off a bunch of parallel tasks, each calculates Pi. Worker #0 reports its own calculation results back to standard out. ^Flavor^MPI Compiler^Arguments^ | mvapich |/share/apps/mvapich-0.9.9/bin/mpicc | -o cpi_mvapich cpi.c | | openmpi |/share/apps/openmpi-1.2/bin/mpicc | -o cpi_openmpi cpi.c | | openmpi_intel |/share/apps/openmpi-1.2_intel/bin/mpicc | -o cpi_openmpi_intel cpi.c | | topspin |/usr/local/topspin/mpi/mpich/bin/mpicc | -o cpi_topspin cpi.c | The surprise here is that we end up with binaries ranging in size of 10 Kb to 3 MB. Topspin is the MPI flavor that came with our cluster for the Infiniband switch. MVApich and OpenMPI were downloaded and the source compiled with ''gcc''. The alternate OpenMPI (openmpi_intel) was compiled with Intel's compilers. Topspin can only run across the Infiniband switch but both OpenMPI flavors can use either switch. lrwxrwxrwx 1 hmeij its 33 Jan 3 15:49 cpi.c -> /share/apps/openmpi-1.2/bin/cpi.c -rwxr-xr-x 1 hmeij its 406080 Jan 7 14:38 cpi_mvapich -rwxr-xr-x 1 hmeij its 10166 Jan 8 15:36 cpi_openmpi -rwxr-xr-x 1 hmeij its 3023929 Jan 3 16:32 cpi_openmpi_intel -rwxr-xr-x 1 hmeij its 9781 Jan 3 16:25 cpi_topspin ==== Job Script ==== Here is the test script we'll use for testing. Note the lack of the ''-n'' option on the line invoking our application. We will ask for 4 parallel tasks with 2 tasks per node. #!/bin/bash rm -f ./err ./out #BSUB -q imw #BSUB -n 4 #BSUB -R "span[ptile=2]" #BSUB -J mpi.lsf #BSUB -e err #BSUB -o out # WRAPPERS echo topsin time /share/apps/bin/lsf.topspin.wrapper ./cpi_topspin echo openmpi time /share/apps/bin/lsf.openmpi.wrapper ./cpi_openmpi echo openmpi_intel time /share/apps/bin/lsf.openmpi_intel.wrapper ./cpi_openmpi_intel ==== Infiniband ==== * queue: imw * all 3 MPI flavors * err Process 1 on compute-1-3.local Process 0 on compute-1-3.local Process 3 on compute-1-13.local Process 2 on compute-1-13.local real 0m6.837s user 0m0.032s sys 0m0.086s Process 1 on compute-1-3.local Process 2 on compute-1-13.local Process 3 on compute-1-13.local Process 0 on compute-1-3.local real 0m2.071s user 0m0.018s sys 0m0.035s Process 0 on compute-1-3.local Process 1 on compute-1-3.local Process 2 on compute-1-13.local Process 3 on compute-1-13.local real 0m1.489s user 0m0.014s sys 0m0.035s * out The output (if any) follows: topsin pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.000184 openmpi pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.010902 openmpi_intel pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.021757 ==== Ethernet ==== * queue ''emw'' * MPI: openmpi_intel only. * err -------------------------------------------------------------------------- [0,1,0]: MVAPI on host compute-1-23 was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,1]: MVAPI on host compute-1-23 was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,2]: MVAPI on host compute-1-18 was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,3]: MVAPI on host compute-1-18 was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- Process 0 on compute-1-23.local Process 2 on compute-1-18.local Process 3 on compute-1-18.local Process 1 on compute-1-23.local real 0m1.344s user 0m0.014s sys 0m0.022s * out The output (if any) follows: openmpi_intel pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.013772 ==== Amber ==== To test Amber with the new scripts, we rerun the programs from [[cluster:42#results|these test results]]. Note that the memory footprint has changed of the nodes involved. Lets tally up some runs and note the run time. We'll run each twice ^Amber^MPI Flavor^Switch^NProcs^JAC bench 1^JAC bench 2^Factor_IX bench 1^Factor_IX bench 2^ |9 |topspin |infiniband | 4 | 01m38s | 01m57s | 02m45s | 02m38s | |9openmpi |openmpi_intel |infiniband | 4 | 01m30s | 01m35s | 02m34s | 02m035s | |9openmpi |openmpi_intel |ethernet | 4 | 02m15s | 02m06s | 03m32s | 03m47s | \\ **[[cluster:0|Home]]**