User Tools

Site Tools





To test your environment execute the following two binaries and compare the output. It should all be set up for you already. If not, contact the HPCadmin.


[hmeij@swallowtail ~]$ /share/apps/bin/
Running on ilogin1 and ilogin2 with -np=16
Hello, world, I am 0 of 16
Hello, world, I am 11 of 16
Hello, world, I am 1 of 16
Hello, world, I am 2 of 16
Hello, world, I am 3 of 16
Hello, world, I am 4 of 16
Hello, world, I am 5 of 16
Hello, world, I am 6 of 16
Hello, world, I am 7 of 16
Hello, world, I am 8 of 16
Hello, world, I am 9 of 16
Hello, world, I am 10 of 16
Hello, world, I am 12 of 16
Hello, world, I am 13 of 16
Hello, world, I am 14 of 16
Hello, world, I am 15 of 16


[hmeij@swallowtail ~]$ /share/apps/bin/
Running on ilogin1 and ilogin2 with -np=16
Process 10 on compute-1-16.local
Process 0 on compute-1-15.local
Process 2 on compute-1-15.local
Process 3 on compute-1-15.local
Process 4 on compute-1-15.local
Process 5 on compute-1-15.local
Process 6 on compute-1-15.local
Process 7 on compute-1-15.local
Process 1 on compute-1-15.local
pi is approximately 3.1416009869231245, Error is 0.0000083333333314
wall clock time = 0.166646
Process 8 on compute-1-16.local
Process 9 on compute-1-16.local
Process 11 on compute-1-16.local
Process 12 on compute-1-16.local
Process 13 on compute-1-16.local
Process 14 on compute-1-16.local
Process 15 on compute-1-16.local

done. For those that are interested, below is the what & where of OpenMPI on our cluster.


install directory: /share/apps/openmpi-1.2

… you can add the bin/ subdirectory to your path if you want. Not really necessary as long as you provide the full path to the binaries in your scripts.

The two scripts and invoke the mpirun binary, like so


echo Running on ilogin1 and ilogin2 with -np=16

/share/apps/openmpi-1.2/bin/mpirun -np 16 \
  -machinefile /share/apps/openmpi-1.2/bin/hello.machines \

The two binaries have libraries linked in, like so

[hmeij@swallowtail ~]# ldd /share/apps/openmpi-1.2/bin/hello => /share/apps/openmpi-1.2/lib/ (0x0000002a95557000) => /share/apps/openmpi-1.2/lib/ (0x0000002a956eb000) => /share/apps/openmpi-1.2/lib/ (0x0000002a95844000) => /lib64/ (0x0000003684000000) => /lib64/ (0x0000003686d00000) => /lib64/ (0x0000003688600000) => /lib64/tls/ (0x0000003683e00000) => /lib64/tls/ (0x0000003684400000) => /lib64/tls/ (0x0000003683b00000)
        /lib64/ (0x0000003683900000)


When you compile for example C code for OpenMPI

/share/apps/openmpi-1.2/bin/mpicc -o ./mpi /share/apps/openmpi-1.2/bin/cpi.c

check that the create binary finds all the libraries with ldd (see output above)

The Problem

Once you have your binary compiled, you can execute it on the head node or any other as described above. But the and programs point the mpirun program to a hardcoded “machines” file.

This will not work when submitting your program to bsub. Platform reports:

<hi yellow> As I mentioned, Lava is not natively capable for parallel jobs, so you will have to write your own integration script to parse the hosts allocated by LSF (with LSB_HOSTS variable) and integrate them to your MPI distribution. </hi>

<hi orange> Also, remind you that, because the lack of LSF's parallel support daemons, these scripts can only provide a loose integration to Lava. Specifically, Lava only knows the mpirun process on the first host; not knowledge to other parallel processes in other hosts invovled in a paralell job. So if, in some circumstances, a parallel job fails, Lava cannot clean up the leftover processes, for example, mpich 1's shared-memory leftovers. You may want to regularly checks on your cluster on this issue. </hi>

And this makes the job submission process for parallel jobs tedious.

So click on Back and we'll detail that.


cluster/31.txt · Last modified: 2007/04/19 19:45 (external edit)