This shows you the differences between two versions of the page.
cluster:31 [2007/04/19 15:45] |
cluster:31 [2007/04/19 15:45] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | \\ | ||
+ | **[[cluster: | ||
+ | ====== OpenMPI ENV ====== | ||
+ | |||
+ | |||
+ | ===== Tests ===== | ||
+ | |||
+ | To test your environment execute the following two binaries and compare the output. | ||
+ | |||
+ | **#1** | ||
+ | < | ||
+ | [hmeij@swallowtail ~]$ / | ||
+ | Running on ilogin1 and ilogin2 with -np=16 | ||
+ | Hello, world, I am 0 of 16 | ||
+ | Hello, world, I am 11 of 16 | ||
+ | Hello, world, I am 1 of 16 | ||
+ | Hello, world, I am 2 of 16 | ||
+ | Hello, world, I am 3 of 16 | ||
+ | Hello, world, I am 4 of 16 | ||
+ | Hello, world, I am 5 of 16 | ||
+ | Hello, world, I am 6 of 16 | ||
+ | Hello, world, I am 7 of 16 | ||
+ | Hello, world, I am 8 of 16 | ||
+ | Hello, world, I am 9 of 16 | ||
+ | Hello, world, I am 10 of 16 | ||
+ | Hello, world, I am 12 of 16 | ||
+ | Hello, world, I am 13 of 16 | ||
+ | Hello, world, I am 14 of 16 | ||
+ | Hello, world, I am 15 of 16 | ||
+ | </ | ||
+ | |||
+ | **#2** | ||
+ | < | ||
+ | [hmeij@swallowtail ~]$ / | ||
+ | Running on ilogin1 and ilogin2 with -np=16 | ||
+ | Process 10 on compute-1-16.local | ||
+ | Process 0 on compute-1-15.local | ||
+ | Process 2 on compute-1-15.local | ||
+ | Process 3 on compute-1-15.local | ||
+ | Process 4 on compute-1-15.local | ||
+ | Process 5 on compute-1-15.local | ||
+ | Process 6 on compute-1-15.local | ||
+ | Process 7 on compute-1-15.local | ||
+ | Process 1 on compute-1-15.local | ||
+ | pi is approximately 3.1416009869231245, | ||
+ | wall clock time = 0.166646 | ||
+ | Process 8 on compute-1-16.local | ||
+ | Process 9 on compute-1-16.local | ||
+ | Process 11 on compute-1-16.local | ||
+ | Process 12 on compute-1-16.local | ||
+ | Process 13 on compute-1-16.local | ||
+ | Process 14 on compute-1-16.local | ||
+ | Process 15 on compute-1-16.local | ||
+ | </ | ||
+ | |||
+ | done. For those that are interested, below is the what & where of OpenMPI on our cluster. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== OpenMPI ===== | ||
+ | |||
+ | install directory: ''/ | ||
+ | |||
+ | ... you can add the '' | ||
+ | |||
+ | The two scripts '' | ||
+ | |||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | echo Running on ilogin1 and ilogin2 with -np=16 | ||
+ | |||
+ | / | ||
+ | -machinefile / | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | The two binaries have libraries linked in, like so | ||
+ | |||
+ | < | ||
+ | [hmeij@swallowtail ~]# ldd / | ||
+ | libmpi.so.0 => / | ||
+ | libopen-rte.so.0 => / | ||
+ | libopen-pal.so.0 => / | ||
+ | libdl.so.2 => / | ||
+ | libnsl.so.1 => / | ||
+ | libutil.so.1 => / | ||
+ | libm.so.6 => / | ||
+ | libpthread.so.0 => / | ||
+ | libc.so.6 => / | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | ===== Compiling ===== | ||
+ | |||
+ | When you compile for example C code for OpenMPI | ||
+ | |||
+ | < | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | check that the create binary finds all the libraries with '' | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== The Problem ===== | ||
+ | |||
+ | Once you have your binary compiled, you can execute it on the head node or any other as described above. | ||
+ | |||
+ | This will not work when submitting your program to '' | ||
+ | |||
+ | <hi yellow> | ||
+ | As I mentioned, Lava is not natively capable for parallel jobs, so you will have to write your own integration script to parse the hosts allocated by LSF (with LSB_HOSTS variable) and integrate them to your MPI distribution. | ||
+ | </hi> | ||
+ | |||
+ | <hi orange> | ||
+ | Also, remind you that, because the lack of LSF's parallel support daemons, these scripts can only provide a loose integration to Lava. Specifically, | ||
+ | </hi> | ||
+ | |||
+ | |||
+ | And this makes the job submission process for parallel jobs tedious. | ||
+ | |||
+ | So click on **Back** and we'll detail that. | ||
+ | |||
+ | \\ | ||
+ | **[[cluster: |