This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:164 [2017/10/26 18:26] hmeij07 |
cluster:164 [2018/09/21 11:59] (current) hmeij07 |
||
---|---|---|---|
Line 106: | Line 106: | ||
nvidia-smi -pm 0; nvidia-smi -c 0 | nvidia-smi -pm 0; nvidia-smi -c 0 | ||
# gpu_id is done via CUDA_VISIBLE_DEVICES | # gpu_id is done via CUDA_VISIBLE_DEVICES | ||
- | export | + | export |
# on n78 | # on n78 | ||
/ | / | ||
Line 195: | Line 195: | ||
Mapping of GPU IDs to the 16 PP ranks in this node: 0, | Mapping of GPU IDs to the 16 PP ranks in this node: 0, | ||
Performance: | Performance: | ||
+ | |||
+ | # UPDATE Gromacs 2018, check out these new performance stats for -n 4, -gpu=4 | ||
+ | |||
+ | # K20, redone with cuda 9 | ||
+ | |||
+ | root@cottontail gpu]# egrep ' | ||
+ | 01/ | ||
+ | 01/ | ||
+ | 02/ | ||
+ | 02/ | ||
+ | 03/ | ||
+ | 03/ | ||
+ | 04/ | ||
+ | 04/ | ||
+ | |||
+ | # GTX1080 cuda 8 | ||
+ | |||
+ | [hmeij@cottontail gpu]$ egrep ' | ||
+ | 01/ | ||
+ | 01/ | ||
+ | 02/ | ||
+ | 02/ | ||
+ | 03/ | ||
+ | 03/ | ||
+ | 04/ | ||
+ | 04/ | ||
+ | |||
+ | Almost 900 ns/day for a single server. | ||
</ | </ | ||
Line 563: | Line 591: | ||
</ | </ | ||
+ | ==== PPMA Bench ==== | ||
+ | |||
+ | * Runs fastest when constrined to one gpu with 4 mpi threads | ||
+ | * Room for improvement as gpu and gpu memory are not fully utilized | ||
+ | * Adding mpi threads or more gpus reduces ns/day performance | ||
+ | * No idea if adding omp threads shows a different picture | ||
+ | * No idea how it compares to K20 gpus | ||
+ | |||
+ | < | ||
+ | |||
+ | nvidia-smi -pm 0; nvidia-smi -c 0 | ||
+ | # gpu_id is done via CUDA_VISIBLE_DEVICES | ||
+ | export CUDA_VISIBLE_DEVCES=[0, | ||
+ | |||
+ | # on n78 | ||
+ | cd / | ||
+ | rm -f / | ||
+ | time / | ||
+ | / | ||
+ | -in nvt.in -var t 310 > /dev/null 2>& | ||
+ | |||
+ | |||
+ | PMMA Benchmark Performance Metric ns/day (x nr of gpus for node output) | ||
+ | |||
+ | |||
+ | Lammps 11Aug17 on GTX1080Ti (n78) | ||
+ | |||
+ | -n 1, -gpu_id 3 | ||
+ | Performance: | ||
+ | 3, GeForce GTX 1080 Ti, 38, 219 MiB, 10953 MiB, 30 %, 1 % | ||
+ | -n 2, -gpu_id 3 | ||
+ | Performance: | ||
+ | 3, GeForce GTX 1080 Ti, 57, 358 MiB, 10814 MiB, 47 %, 3 % | ||
+ | -n 4, -gpu_id 3 | ||
+ | Performance: | ||
+ | 3, GeForce GTX 1080 Ti, 59, 690 MiB, 10482 MiB, 76 %, 4 % | ||
+ | -n 8, -gpu_id 3 | ||
+ | Performance: | ||
+ | 3, GeForce GTX 1080 Ti, 47, 1332 MiB, 9840 MiB, 90 %, 4 % | ||
+ | -n 4, -gpu_id 01 | ||
+ | Performance: | ||
+ | 0, GeForce GTX 1080 Ti, 48, 350 MiB, 10822 MiB, 50 %, 3 % | ||
+ | 1, GeForce GTX 1080 Ti, 37, 344 MiB, 10828 MiB, 49 %, 3 % | ||
+ | -n 8, -gpu_id 01 | ||
+ | Performance: | ||
+ | 0, GeForce GTX 1080 Ti, 66, 670 MiB, 10502 MiB, 77 %, 4 % | ||
+ | 1, GeForce GTX 1080 Ti, 51, 670 MiB, 10502 MiB, 81 %, 4 % | ||
+ | -n 12, -gpu_id 01 | ||
+ | Performance: | ||
+ | 0, GeForce GTX 1080 Ti, 65, 988 MiB, 10184 MiB, 82 %, 4 % | ||
+ | 1, GeForce GTX 1080 Ti, 50, 990 MiB, 10182 MiB, 85 %, 4 % | ||
+ | -n 8, -gpu_id 0123 | ||
+ | Performance: | ||
+ | 0, GeForce GTX 1080 Ti, 56, 340 MiB, 10832 MiB, 57 %, 3 % | ||
+ | 1, GeForce GTX 1080 Ti, 41, 340 MiB, 10832 MiB, 52 %, 2 % | ||
+ | 2, GeForce GTX 1080 Ti, 43, 340 MiB, 10832 MiB, 57 %, 3 % | ||
+ | 3, GeForce GTX 1080 Ti, 42, 340 MiB, 10832 MiB, 55 %, 2 % | ||
+ | -n 12, -gpuid 0123 | ||
+ | Performance: | ||
+ | -n 16 | ||
+ | Performance: | ||
+ | |||
+ | |||
+ | |||
+ | # on n34 | ||
+ | unable to get it to run... | ||
+ | |||
+ | K20 on n34 | ||
+ | |||
+ | -n 1, -gpu_id 0 | ||
+ | -n 4, -gpu_id 0 | ||
+ | -n 4, -gpuid 0123 | ||
+ | |||
+ | # comparison of binaries running PMMA | ||
+ | # 1 gpu 4 mpi threads each run | ||
+ | |||
+ | # lmp_mpi-double-double-with-gpu.log | ||
+ | Performance: | ||
+ | # lmp_mpi-single-double-with-gpu.log | ||
+ | Performance: | ||
+ | # lmp_mpi-single-single-with-gpu.log | ||
+ | Performance: | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== FSL ==== | ||
+ | |||
+ | **User Time Reported** from time command | ||
+ | |||
+ | * mwgpu cpu run | ||
+ | * 2013 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | ||
+ | * All tests 45m | ||
+ | * Bft test 16m28s (bedpostx) | ||
+ | |||
+ | * amber128 cpu run | ||
+ | * 2017 model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | ||
+ | * All tests 17m - 2.5x faster | ||
+ | * Bft test 3m39s - 6x faster (bedpostx) | ||
+ | |||
+ | * amber128 gpu run | ||
+ | * 2017 CUDA Device Name: GeForce GTX 1080 Ti | ||
+ | * Bft gpu test 0m1.881s (what!? from command line) - 116x faster (bedpostx_gpu) | ||
+ | * Bft gpu test 0m1.850s (what!? via scheduler) - 118x faster (bedpostx_gpu) | ||
+ | |||
+ | |||
+ | ==== FreeSurfer ==== | ||
+ | |||
+ | |||
+ | * http:// | ||
+ | * Example using sample-001.mgz | ||
+ | |||
+ | < | ||
+ | |||
+ | Node n37 (mwgpu cpu run) | ||
+ | (2013) Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | ||
+ | recon-all -s bert finished without error | ||
+ | example 1 user 0m3.516s | ||
+ | example 2 user 893m1.761s ~15 hours | ||
+ | example 3 user ???m ~15 hours (estimated) | ||
+ | |||
+ | Node n78 (amber128 cpu run) | ||
+ | (2017) Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | ||
+ | recon-all -s bert finished without error | ||
+ | example 1 user 0m2.315s | ||
+ | example 2 user 488m49.215s ~8 hours | ||
+ | example 3 user 478m44.622s ~8 hours | ||
+ | |||
+ | |||
+ | freeview -v \ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | -f \ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | bert/ | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | Development code for the GPU http:// | ||
+ | |||
\\ | \\ | ||
**[[cluster: | **[[cluster: |