This shows you the differences between two versions of the page.
cluster:42 [2007/08/09 16:48] |
cluster:42 [2007/08/09 16:48] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | \\ | ||
+ | **[[cluster: | ||
+ | |||
+ | |||
+ | |||
+ | ⇒ This is page 2 of 3, navigation provided at bottom of page | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== Switch & MPI Flavors ===== | ||
+ | |||
+ | As you can see in **[[cluster: | ||
+ | |||
+ | What we'd like to know next is: | ||
+ | |||
+ | - when should we consider the hardware switch in question? | ||
+ | - which flavor of MPI should we use? | ||
+ | - does any of it matter? | ||
+ | |||
+ | The Galaxsee example invoked OpenMPI (compiled with the TopSpin libraries) running across nodes on the gigabit ethernet switch (GigE). Executables built against OpenMPI can run over either the IB or GigE switch. The default is to use InfiniBand (IB) if the interfaces are active. In the absence of an IB interconnect, | ||
+ | |||
+ | < | ||
+ | -------------------------------------------------------------------------- | ||
+ | [0,1,0]: MVAPI on host nfs-2-4 was unable to find any HCAs. | ||
+ | Another transport will be used instead, although this may result in | ||
+ | lower performance. | ||
+ | -------------------------------------------------------------------------- | ||
+ | -------------------------------------------------------------------------- | ||
+ | [0,1,2]: MVAPI on host nfs-2-4 was unable to find any HCAs. | ||
+ | Another transport will be used instead, although this may result in | ||
+ | lower performance. | ||
+ | -------------------------------------------------------------------------- | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | Lets run a chemistry example using Amber9 with sander.MPI. | ||
+ | |||
+ | Our cluster contains TopSpin infiniband libraries (from Cisco) specific to our Cisco switch. | ||
+ | |||
+ | < | ||
+ | # intel compilers + topspin | ||
+ | ./configure --prefix / | ||
+ | CC=icc CXX=icpc F77=ifort FC=ifort \ | ||
+ | --disable-shared --enable-static \ | ||
+ | --with-mvapi=/ | ||
+ | --with-mvapi-libdir=/ | ||
+ | </ | ||
+ | |||
+ | OpenMPI was also recompiled against gcc/g95 without any references to the IB libraries. | ||
+ | |||
+ | We have 3 versions of Amber 8-) | ||
+ | |||
+ | #1 Amber was compiled (with icc/ifort) against the TopSpin installation by specifying the following in config.h and was installed in **''/ | ||
+ | |||
+ | < | ||
+ | LOAD= ifort | ||
+ | ... | ||
+ | LOADLIB= -L/ | ||
+ | | ||
+ | | ||
+ | | ||
+ | -lvml -lmkl_lapack -lmkl -lguide -lpthread | ||
+ | </ | ||
+ | |||
+ | #2 Amber was compiled (with icc/ifort) against the OpenMPI installation (described above) by specifying the following in config.h and was installed in **''/ | ||
+ | |||
+ | < | ||
+ | LOAD= mpif90 | ||
+ | (resolves to / | ||
+ | ... | ||
+ | LOADLIB= -L/ | ||
+ | | ||
+ | | ||
+ | | ||
+ | -ldl -Wl, | ||
+ | | ||
+ | -lvml -lmkl_lapack -lmkl -lguide -lpthread | ||
+ | </ | ||
+ | |||
+ | #3 Amber was again compiled (with gcc/g95) against the " | ||
+ | |||
+ | Complicated enough? | ||
+ | |||
+ | |||
+ | |||
+ | ===== Test Runs ===== | ||
+ | |||
+ | Lets do some test runs. There will be some noise here as we're running against nodes that are doing work but we'll avoid heavily loaded nodes. For runs with less than 8 cores per request we run against an idle host to get good baseline results. | ||
+ | |||
+ | < | ||
+ | |||
+ | #!/bin/bash | ||
+ | |||
+ | #BSUB -q 16-ilwnodes | ||
+ | #BSUB -J test | ||
+ | #BSUB -o out | ||
+ | #BSUB -e err | ||
+ | |||
+ | # change next 2 lines | ||
+ | #BSUB -n 8 | ||
+ | NP=8 | ||
+ | |||
+ | # gcc/g95 compiled sander + GigE only openmpi | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | |||
+ | # intel compiled sander + IB/GigE openmpi | ||
+ | #export LD_LIBRARY_PATH=/ | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | |||
+ | # intel compiled sander + infiniband topspin | ||
+ | SANDER="/ | ||
+ | MPIRUN="/ | ||
+ | PATH=/ | ||
+ | |||
+ | # scratch dirs | ||
+ | MYSANSCRATCH=/ | ||
+ | MYLOCALSCRATCH=/ | ||
+ | |||
+ | rm -rf err out logfile mdout restrt mdinfo | ||
+ | cd $MYSANSCRATCH | ||
+ | export PATH | ||
+ | |||
+ | # which interconnects, | ||
+ | TEST=' | ||
+ | if [ -f / | ||
+ | TEST=" | ||
+ | awk -F: ' | ||
+ | fi | ||
+ | if [ $TEST == ' | ||
+ | echo " | ||
+ | DO=$MPIRUN | ||
+ | else | ||
+ | echo " | ||
+ | # eth1, the nfs switch | ||
+ | DO=" | ||
+ | fi | ||
+ | |||
+ | # jac bench | ||
+ | cp / | ||
+ | cp / | ||
+ | cp / | ||
+ | time $DO -np $NP $SANDER -O -i mdin -c inpcrd.equil -o mdout < /dev/null | ||
+ | cp ./mdout / | ||
+ | |||
+ | # factor_ix bench | ||
+ | cp / | ||
+ | cp / | ||
+ | cp / | ||
+ | time $DO -np $NP $SANDER -O -i mdin -o mdout < /dev/null | ||
+ | cp ./mdout / | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== Results ===== | ||
+ | |||
+ | Puzzling ... or maybe not. | ||
+ | |||
+ | ^ single host compute-1-2 (IB/GigE enabled) | ||
+ | ^ q: test ^^ q: test ^^ q: test ^^ | ||
+ | ^ l: OpenMPI (GigE) | ||
+ | ^ -np ^ time ^ -np ^ time ^ -np ^ time ^ | ||
+ | | 02 | 10m33s | ||
+ | | 04 | 5m22s | 04 | 4m42s - 4m35s | 04 | 5m52s | | ||
+ | | 08 | 3m40s | 08 | 2m58s - 2m56s | 08 | 3m13s | | ||
+ | |||
+ | Perhaps our problem is not complex enough to show differences amongst the different MPI and switch options. | ||
+ | |||
+ | Next lets ask for enough cores so that we invoke multiple nodes (the host count below) and thus the appropriate interface. | ||
+ | |||
+ | |||
+ | |||
+ | ^ switching to larger core requests | ||
+ | ^ q: 16-lwnodes | ||
+ | ^ l: OpenMPI (GigE) | ||
+ | ^ -np (hosts) | ||
+ | | 16 | 18m16s(4) | ||
+ | | 32 | 17m19s(7) | ||
+ | |||
+ | We now observe the dramatic increase in performance for the infiniband switch. | ||
+ | |||
+ | So you could run your Amber parallel job across the GigE-enabled nodes. | ||
+ | |||
+ | ^ switching to idle queue ^^^^^^ | ||
+ | ^ l: OpenMPI (GigE) | ||
+ | ^ -np (hosts) | ||
+ | | 64 | DNK | ||
+ | |||
+ | In the end it appears that picking a flavor of MPI does not have any impact. | ||
+ | |||
+ | |||
+ | => go to [[cluster: | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **[[cluster: |