cluster:164
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:164 [2017/10/27 19:39] – [PPMA Bench] hmeij07 | cluster:164 [2018/09/21 11:59] (current) – hmeij07 | ||
|---|---|---|---|
| Line 106: | Line 106: | ||
| nvidia-smi -pm 0; nvidia-smi -c 0 | nvidia-smi -pm 0; nvidia-smi -c 0 | ||
| # gpu_id is done via CUDA_VISIBLE_DEVICES | # gpu_id is done via CUDA_VISIBLE_DEVICES | ||
| - | export | + | export |
| # on n78 | # on n78 | ||
| / | / | ||
| Line 195: | Line 195: | ||
| Mapping of GPU IDs to the 16 PP ranks in this node: 0, | Mapping of GPU IDs to the 16 PP ranks in this node: 0, | ||
| Performance: | Performance: | ||
| + | |||
| + | # UPDATE Gromacs 2018, check out these new performance stats for -n 4, -gpu=4 | ||
| + | |||
| + | # K20, redone with cuda 9 | ||
| + | |||
| + | root@cottontail gpu]# egrep ' | ||
| + | 01/ | ||
| + | 01/ | ||
| + | 02/ | ||
| + | 02/ | ||
| + | 03/ | ||
| + | 03/ | ||
| + | 04/ | ||
| + | 04/ | ||
| + | |||
| + | # GTX1080 cuda 8 | ||
| + | |||
| + | [hmeij@cottontail gpu]$ egrep ' | ||
| + | 01/ | ||
| + | 01/ | ||
| + | 02/ | ||
| + | 02/ | ||
| + | 03/ | ||
| + | 03/ | ||
| + | 04/ | ||
| + | 04/ | ||
| + | |||
| + | Almost 900 ns/day for a single server. | ||
| </ | </ | ||
| Line 636: | Line 664: | ||
| -n 4, -gpuid 0123 | -n 4, -gpuid 0123 | ||
| + | # comparison of binaries running PMMA | ||
| + | # 1 gpu 4 mpi threads each run | ||
| + | |||
| + | # lmp_mpi-double-double-with-gpu.log | ||
| + | Performance: | ||
| + | # lmp_mpi-single-double-with-gpu.log | ||
| + | Performance: | ||
| + | # lmp_mpi-single-single-with-gpu.log | ||
| + | Performance: | ||
| </ | </ | ||
| + | ==== FSL ==== | ||
| + | |||
| + | **User Time Reported** from time command | ||
| + | |||
| + | * mwgpu cpu run | ||
| + | * 2013 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | ||
| + | * All tests 45m | ||
| + | * Bft test 16m28s (bedpostx) | ||
| + | |||
| + | * amber128 cpu run | ||
| + | * 2017 model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | ||
| + | * All tests 17m - 2.5x faster | ||
| + | * Bft test 3m39s - 6x faster (bedpostx) | ||
| + | |||
| + | * amber128 gpu run | ||
| + | * 2017 CUDA Device Name: GeForce GTX 1080 Ti | ||
| + | * Bft gpu test 0m1.881s (what!? from command line) - 116x faster (bedpostx_gpu) | ||
| + | * Bft gpu test 0m1.850s (what!? via scheduler) - 118x faster (bedpostx_gpu) | ||
| + | |||
| + | |||
| + | ==== FreeSurfer ==== | ||
| + | |||
| + | |||
| + | * http:// | ||
| + | * Example using sample-001.mgz | ||
| + | |||
| + | < | ||
| + | |||
| + | Node n37 (mwgpu cpu run) | ||
| + | (2013) Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | ||
| + | recon-all -s bert finished without error | ||
| + | example 1 user 0m3.516s | ||
| + | example 2 user 893m1.761s ~15 hours | ||
| + | example 3 user ???m ~15 hours (estimated) | ||
| + | |||
| + | Node n78 (amber128 cpu run) | ||
| + | (2017) Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | ||
| + | recon-all -s bert finished without error | ||
| + | example 1 user 0m2.315s | ||
| + | example 2 user 488m49.215s ~8 hours | ||
| + | example 3 user 478m44.622s ~8 hours | ||
| + | |||
| + | |||
| + | freeview -v \ | ||
| + | bert/ | ||
| + | bert/ | ||
| + | bert/ | ||
| + | bert/ | ||
| + | -f \ | ||
| + | bert/ | ||
| + | bert/ | ||
| + | bert/ | ||
| + | bert/ | ||
| + | |||
| + | |||
| + | </ | ||
| + | |||
| + | Development code for the GPU http:// | ||
| + | |||
| \\ | \\ | ||
| **[[cluster: | **[[cluster: | ||
cluster/164.1509133183.txt.gz · Last modified: by hmeij07
