cluster:31
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
| — | cluster:31 [2007/04/19 19:45] (current) – created - external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | \\ | ||
| + | **[[cluster: | ||
| + | ====== OpenMPI ENV ====== | ||
| + | |||
| + | |||
| + | ===== Tests ===== | ||
| + | |||
| + | To test your environment execute the following two binaries and compare the output. | ||
| + | |||
| + | **#1** | ||
| + | < | ||
| + | [hmeij@swallowtail ~]$ / | ||
| + | Running on ilogin1 and ilogin2 with -np=16 | ||
| + | Hello, world, I am 0 of 16 | ||
| + | Hello, world, I am 11 of 16 | ||
| + | Hello, world, I am 1 of 16 | ||
| + | Hello, world, I am 2 of 16 | ||
| + | Hello, world, I am 3 of 16 | ||
| + | Hello, world, I am 4 of 16 | ||
| + | Hello, world, I am 5 of 16 | ||
| + | Hello, world, I am 6 of 16 | ||
| + | Hello, world, I am 7 of 16 | ||
| + | Hello, world, I am 8 of 16 | ||
| + | Hello, world, I am 9 of 16 | ||
| + | Hello, world, I am 10 of 16 | ||
| + | Hello, world, I am 12 of 16 | ||
| + | Hello, world, I am 13 of 16 | ||
| + | Hello, world, I am 14 of 16 | ||
| + | Hello, world, I am 15 of 16 | ||
| + | </ | ||
| + | |||
| + | **#2** | ||
| + | < | ||
| + | [hmeij@swallowtail ~]$ / | ||
| + | Running on ilogin1 and ilogin2 with -np=16 | ||
| + | Process 10 on compute-1-16.local | ||
| + | Process 0 on compute-1-15.local | ||
| + | Process 2 on compute-1-15.local | ||
| + | Process 3 on compute-1-15.local | ||
| + | Process 4 on compute-1-15.local | ||
| + | Process 5 on compute-1-15.local | ||
| + | Process 6 on compute-1-15.local | ||
| + | Process 7 on compute-1-15.local | ||
| + | Process 1 on compute-1-15.local | ||
| + | pi is approximately 3.1416009869231245, | ||
| + | wall clock time = 0.166646 | ||
| + | Process 8 on compute-1-16.local | ||
| + | Process 9 on compute-1-16.local | ||
| + | Process 11 on compute-1-16.local | ||
| + | Process 12 on compute-1-16.local | ||
| + | Process 13 on compute-1-16.local | ||
| + | Process 14 on compute-1-16.local | ||
| + | Process 15 on compute-1-16.local | ||
| + | </ | ||
| + | |||
| + | done. For those that are interested, below is the what & where of OpenMPI on our cluster. | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ===== OpenMPI ===== | ||
| + | |||
| + | install directory: ''/ | ||
| + | |||
| + | ... you can add the '' | ||
| + | |||
| + | The two scripts '' | ||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | |||
| + | echo Running on ilogin1 and ilogin2 with -np=16 | ||
| + | |||
| + | / | ||
| + | -machinefile / | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | The two binaries have libraries linked in, like so | ||
| + | |||
| + | < | ||
| + | [hmeij@swallowtail ~]# ldd / | ||
| + | libmpi.so.0 => / | ||
| + | libopen-rte.so.0 => / | ||
| + | libopen-pal.so.0 => / | ||
| + | libdl.so.2 => / | ||
| + | libnsl.so.1 => / | ||
| + | libutil.so.1 => / | ||
| + | libm.so.6 => / | ||
| + | libpthread.so.0 => / | ||
| + | libc.so.6 => / | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | ===== Compiling ===== | ||
| + | |||
| + | When you compile for example C code for OpenMPI | ||
| + | |||
| + | < | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | check that the create binary finds all the libraries with '' | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ===== The Problem ===== | ||
| + | |||
| + | Once you have your binary compiled, you can execute it on the head node or any other as described above. | ||
| + | |||
| + | This will not work when submitting your program to '' | ||
| + | |||
| + | <hi yellow> | ||
| + | As I mentioned, Lava is not natively capable for parallel jobs, so you will have to write your own integration script to parse the hosts allocated by LSF (with LSB_HOSTS variable) and integrate them to your MPI distribution. | ||
| + | </hi> | ||
| + | |||
| + | <hi orange> | ||
| + | Also, remind you that, because the lack of LSF's parallel support daemons, these scripts can only provide a loose integration to Lava. Specifically, | ||
| + | </hi> | ||
| + | |||
| + | |||
| + | And this makes the job submission process for parallel jobs tedious. | ||
| + | |||
| + | So click on **Back** and we'll detail that. | ||
| + | |||
| + | \\ | ||
| + | **[[cluster: | ||
cluster/31.txt · Last modified: by 127.0.0.1
