cluster:41
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
| — | cluster:41 [2007/10/17 15:11] (current) – created - external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | \\ | ||
| + | **[[cluster: | ||
| + | A parallel code example pulled from the [[http:// | ||
| + | |||
| + | => This is page 1 of 3, navigation provided at bottom of page | ||
| + | |||
| + | |||
| + | ===== GalaxSee: N-Body Physics ===== | ||
| + | |||
| + | |||
| + | ==== Default Behavior ==== | ||
| + | |||
| + | The problem is described **[[http:// | ||
| + | The Shodor web version of Galaxsee **[[http:// | ||
| + | |||
| + | The program **// | ||
| + | |||
| + | It was compiled with **''/ | ||
| + | |||
| + | To get sense for what it does, lets run it interactively with a display on the head node. No parameters are defined so we'll run with default values (1000 100 1000 for number of planet bodies, mass of bodies and run time): | ||
| + | |||
| + | < | ||
| + | [hmeij@swalloowtail ~]$ / | ||
| + | </ | ||
| + | |||
| + | A screen popups with slowly rotating bodies of mass, which looks like this: | ||
| + | |||
| + | | top view | side view | | ||
| + | | {{: | ||
| + | | top view | side view | | ||
| + | |||
| + | Now lets add some bodies, some mass and run time parameters: | ||
| + | |||
| + | < | ||
| + | [hmeij@swalloowtail ~]$ / | ||
| + | </ | ||
| + | |||
| + | That seems to tax the localhost as the frames start to update erratically. | ||
| + | |||
| + | |||
| + | ==== Parallel: | ||
| + | |||
| + | Submitting a job in parallel needs a helper script **'' | ||
| + | |||
| + | < | ||
| + | / | ||
| + | -np 2 | ||
| + | -machinefile / | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | Parallel speed up in action! | ||
| + | |||
| + | |||
| + | ==== Parallel: | ||
| + | |||
| + | So how is that done? | ||
| + | |||
| + | |The GalaxSee code is a simple implementation of parallelism. Since most of the time in a given N-Body model is spent calculating the forces, we only parallelize that part of the code. “Client” programs that just calculate accelerations are fed every particle’s information, | ||
| + | |||
| + | And what does that look like? | ||
| + | |||
| + | < | ||
| + | |||
| + | [root@swallowtail Gal]# grep MPI_ *cpp | ||
| + | |||
| + | Gal.cpp: | ||
| + | |||
| + | derivs_client.cpp: | ||
| + | |||
| + | derivs.cpp: | ||
| + | derivs.cpp: | ||
| + | |||
| + | derivs.cpp: | ||
| + | |||
| + | derivs.cpp: | ||
| + | |||
| + | derivs.cpp: | ||
| + | |||
| + | derivs.cpp: | ||
| + | |||
| + | Gal.cpp: | ||
| + | |||
| + | </ | ||
| + | |||
| + | // | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ==== Parallel: | ||
| + | |||
| + | So lets run that again on a compute nodes. | ||
| + | |||
| + | * the machinefile contains 8 lines each with entry // | ||
| + | * we'll run the parallel code over the GigE switch | ||
| + | * we'll invoke OpenMPI | ||
| + | |||
| + | < | ||
| + | time / | ||
| + | -np 2 | ||
| + | | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | Some results ... | ||
| + | |||
| + | ^ Run On Single Idle Dual Quad Node (n2-2) ^^ | ||
| + | ^ GigE Node via OpenMPI ^^ | ||
| + | ^ -np ^ time ^ | ||
| + | | 02 | 9m36s | | ||
| + | | 04 | 4m49s | | ||
| + | | 08 | 2m26s | | ||
| + | |||
| + | ... we observe that our program runs faster and faster with more cores. | ||
| + | |||
| + | At this point, we invoke the **'' | ||
| + | |||
| + | * let us try the 04-hwnodes queue, all 4 hosts are idle right now | ||
| + | * so max number of cores we can ask for is 4*2*4=32 (-np) | ||
| + | * -nh below refers to the hosts involved solving the problem | ||
| + | |||
| + | ^ Run On 04-hwnodes Queue, All HW Nodes ^^^ | ||
| + | ^ OpenMPI over GigE Switch ^^^ | ||
| + | ^ -np ^ -nh ^ time ^ | ||
| + | | 04 | 01 | 4m45s | | ||
| + | | 08 | 01 | 2m27s | | ||
| + | | 12 | 02 | 1m42s | | ||
| + | | 16 | 02 | 1m21s | | ||
| + | | 32 | 04 | 0m57s | | ||
| + | |||
| + | Now we observe that our improvements in process time flatten out if we ask for more than 12 cores. | ||
| + | |||
| + | ^ Run On idle Queue, Mix of Nodes ^^^ | ||
| + | ^ OpenMPI over GigE Switch ^^^ | ||
| + | ^ -np ^ -nh ^ time ^ | ||
| + | | 08 | 02 | 2m28s | | ||
| + | | 16 | 03 | 1m23s | | ||
| + | | 32 | 06 | 1m20s | | ||
| + | | 64 | 12 | 1m26s | | ||
| + | | 80 | 14 | 1m16s | | ||
| + | | 96 | 15 | 1m26s | | ||
| + | |||
| + | |||
| + | Moral of story: | ||
| + | |||
| + | => go to **[[cluster: | ||
| + | |||
| + | \\ | ||
| + | **[[cluster: | ||
cluster/41.txt · Last modified: by 127.0.0.1
