This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:164 [2017/10/26 14:15] hmeij07 [Bench] |
cluster:164 [2017/10/26 18:26] hmeij07 |
||
---|---|---|---|
Line 94: | Line 94: | ||
==== Bench ==== | ==== Bench ==== | ||
- | * Amber 16. My sample script | + | * Amber 16. Nucleosome bench runs 4.5x faster than on a K20 |
- | * Do not have enough expertise to assess this, need stats from Kelly | + | * Not sure it is representative of our work load |
+ | * Adding more MPI threads decreases performance | ||
+ | * Running across more gpus (2 or 4) decreases performance | ||
+ | * One Amber process per MPI thread per GPU is optimal | ||
+ | |||
+ | **Wow, I just realized the most important metric: Our k20 has a job throughput of 20 per unit of time. The amber128 queue will have a throughput of 4*4.5 or 18 per same unit of time. One new server matches five old ones, well purchased in 2013. From an amber only perspective.** | ||
+ | |||
+ | < | ||
+ | |||
+ | nvidia-smi -pm 0; nvidia-smi -c 0 | ||
+ | # gpu_id is done via CUDA_VISIBLE_DEVICES | ||
+ | export CUDA_VISIBLE_DEVCES=$STRING_2 | ||
+ | # on n78 | ||
+ | / | ||
+ | -n $STRING_1 $AMBERHOME/ | ||
+ | -p prmtop -c inpcrd -ref inpcrd ; grep ' | ||
+ | # on n34 | ||
+ | / | ||
+ | -np $STRING_1 | ||
+ | |||
+ | |||
+ | Nucleosome Metric ns/day, seconds/ | ||
+ | |||
+ | |||
+ | GTX on n78 | ||
+ | |||
+ | -n 1, -gpu_id 0 | ||
+ | | | ||
+ | -n 2, -gpu_id 0 | ||
+ | | | ||
+ | -n 4, -gpu_id 0 | ||
+ | | | ||
+ | -n 4, -gpu_id 01 | ||
+ | | | ||
+ | -n 8, -gpu_id 01 | ||
+ | | | ||
+ | -n 4, -gpu_id 0123 | ||
+ | | | ||
+ | -n 8, -gpu_id 0123 | ||
+ | | | ||
+ | |||
+ | |||
+ | K20 on n34 | ||
+ | |||
+ | -n 1, -gpu_id 0 | ||
+ | | | ||
+ | -n 4, -gpu_id 0 | ||
+ | | | ||
+ | -n4, -gpuid 0123 | ||
+ | | | ||
+ | |||
+ | |||
+ | |||
+ | </ | ||
* Gromacs 5.1.4 My (Colin' | * Gromacs 5.1.4 My (Colin' | ||
Line 148: | Line 201: | ||
* used the colloid example, not sure if that's a good example | * used the colloid example, not sure if that's a good example | ||
* like gromacs, lots of room for improvements | * like gromacs, lots of room for improvements | ||
+ | * used the double-double binary, | ||
+ | * single-double binary might run faster? | ||
< | < |