This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:164 [2017/10/30 14:02] hmeij07 [PPMA Bench] |
cluster:164 [2018/09/21 11:59] (current) hmeij07 |
||
---|---|---|---|
Line 106: | Line 106: | ||
nvidia-smi -pm 0; nvidia-smi -c 0 | nvidia-smi -pm 0; nvidia-smi -c 0 | ||
# gpu_id is done via CUDA_VISIBLE_DEVICES | # gpu_id is done via CUDA_VISIBLE_DEVICES | ||
- | export | + | export |
# on n78 | # on n78 | ||
/ | / | ||
Line 195: | Line 195: | ||
Mapping of GPU IDs to the 16 PP ranks in this node: 0, | Mapping of GPU IDs to the 16 PP ranks in this node: 0, | ||
Performance: | Performance: | ||
+ | |||
+ | # UPDATE Gromacs 2018, check out these new performance stats for -n 4, -gpu=4 | ||
+ | |||
+ | # K20, redone with cuda 9 | ||
+ | |||
+ | root@cottontail gpu]# egrep ' | ||
+ | 01/ | ||
+ | 01/ | ||
+ | 02/ | ||
+ | 02/ | ||
+ | 03/ | ||
+ | 03/ | ||
+ | 04/ | ||
+ | 04/ | ||
+ | |||
+ | # GTX1080 cuda 8 | ||
+ | |||
+ | [hmeij@cottontail gpu]$ egrep ' | ||
+ | 01/ | ||
+ | 01/ | ||
+ | 02/ | ||
+ | 02/ | ||
+ | 03/ | ||
+ | 03/ | ||
+ | 04/ | ||
+ | 04/ | ||
+ | |||
+ | Almost 900 ns/day for a single server. | ||
</ | </ | ||
Line 637: | Line 665: | ||
# comparison of binaries running PMMA | # comparison of binaries running PMMA | ||
+ | # 1 gpu 4 mpi threads each run | ||
# lmp_mpi-double-double-with-gpu.log | # lmp_mpi-double-double-with-gpu.log | ||
Line 644: | Line 673: | ||
# lmp_mpi-single-single-with-gpu.log | # lmp_mpi-single-single-with-gpu.log | ||
Performance: | Performance: | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== FSL ==== | ||
+ | |||
+ | **User Time Reported** from time command | ||
+ | |||
+ | * mwgpu cpu run | ||
+ | * 2013 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | ||
+ | * All tests 45m | ||
+ | * Bft test 16m28s (bedpostx) | ||
+ | |||
+ | * amber128 cpu run | ||
+ | * 2017 model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | ||
+ | * All tests 17m - 2.5x faster | ||
+ | * Bft test 3m39s - 6x faster (bedpostx) | ||
+ | |||
+ | * amber128 gpu run | ||
+ | * 2017 CUDA Device Name: GeForce GTX 1080 Ti | ||
+ | * Bft gpu test 0m1.881s (what!? from command line) - 116x faster (bedpostx_gpu) | ||
+ | * Bft gpu test 0m1.850s (what!? via scheduler) - 118x faster (bedpostx_gpu) | ||
+ | |||
+ | |||
+ | ==== FreeSurfer ==== | ||
+ | |||
+ | |||
+ | * http:// | ||
+ | * Example using sample-001.mgz | ||
+ | |||
+ | < | ||
+ | |||
+ | Node n37 (mwgpu cpu run) | ||
+ | (2013) Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | ||
+ | recon-all -s bert finished without error | ||
+ | example 1 user 0m3.516s | ||
+ | example 2 user 893m1.761s ~15 hours | ||
+ | example 3 user ???m ~15 hours (estimated) | ||
+ | |||
+ | Node n78 (amber128 cpu run) | ||
+ | (2017) Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | ||
+ | recon-all -s bert finished without error | ||
+ | example 1 user 0m2.315s | ||
+ | example 2 user 488m49.215s ~8 hours | ||
+ | example 3 user 478m44.622s ~8 hours | ||
+ | |||
+ | |||
+ | freeview -v \ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | -f \ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | bert/ | ||
</ | </ | ||
+ | Development code for the GPU http:// | ||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: |