Differences

This shows you the differences between two versions of the page.

--- cluster:164 [2017/10/26 14:26]
hmeij07
+++ cluster:164 [2018/09/21 07:59] (current)
hmeij07
@@ Line 106: / Line 106: @@
 nvidia-smi -pm 0; nvidia-smi -c 0
 # gpu_id is done via CUDA_VISIBLE_DEVICES
-export CUDA_VISIBLE_DEVCES=$STRING_2
+export CUDA_VISIBLE_DEVICES=$STRING_2
 # on n78
 /usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f /home/hmeij/amber/nucleosome/hostfile \
@@ Line 195: / Line 195: @@
 Mapping of GPU IDs to the 16 PP ranks in this node: 0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3
 Performance:       19.814        1.211 (x16 = 317.024)
+# UPDATE Gromacs 2018, check out these new performance stats for -n 4, -gpu=4
+# K20, redone with cuda 9
+root@cottontail gpu]# egrep 'ns/day|Performance' 0[0-4]/md.log
+/md.log:                 (ns/day)    (hour/ns)
+/md.log:Performance:       74.275        0.323
+/md.log:                 (ns/day)    (hour/ns)
+/md.log:Performance:       74.111        0.324
+/md.log:                 (ns/day)    (hour/ns)
+/md.log:Performance:       73.965        0.324
+/md.log:                 (ns/day)    (hour/ns)
+/md.log:Performance:       74.207        0.323
+# GTX1080 cuda 8
+[hmeij@cottontail gpu]$ egrep 'ns/day|Performance' 0[1-4]/md.log
+/md.log:                 (ns/day)    (hour/ns)
+/md.log:Performance:      229.229        0.105
+/md.log:                 (ns/day)    (hour/ns)
+/md.log:Performance:      221.936        0.108
+/md.log:                 (ns/day)    (hour/ns)
+/md.log:Performance:      217.618        0.110
+/md.log:                 (ns/day)    (hour/ns)
+/md.log:Performance:      228.854        0.105
+Almost 900 ns/day for a single server.
 </code>
@@ Line 563: / Line 591: @@
 </code>
+==== PPMA Bench ====
+  * Runs fastest when constrined to one gpu with 4 mpi threads
+  * Room for improvement as gpu and gpu memory are not fully utilized
+  * Adding mpi threads or more gpus reduces ns/day performance
+  * No idea if adding omp threads shows a different picture
+  * No idea how it compares to K20 gpus
+<code>
+nvidia-smi -pm 0; nvidia-smi -c 0
+# gpu_id is done via CUDA_VISIBLE_DEVICES
+export CUDA_VISIBLE_DEVCES=[0,1,2,3]
+# on n78
+cd /home/hmeij/lammps/benchmark
+rm -f /tmp/lmp-run.log;rm -f *.jpg;\
+time /usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f ./hostfile  -n $STRING_1 \
+/usr/local/lammps-11Aug17/lmp_mpi-double-double-with-gpu -suffix gpu -pk gpu $STRING_2 \
+-in nvt.in -var t 310 > /dev/null 2>&1; grep ^Performance /tmp/lmp-run.log
+PMMA Benchmark Performance Metric ns/day (x  nr of gpus for node output)
+Lammps 11Aug17 on GTX1080Ti (n78)
+-n 1, -gpu_id 3
+Performance: 19.974 ns/day, 1.202 hours/ns, 231.176 timesteps/s
+, GeForce GTX 1080 Ti, 38, 219 MiB, 10953 MiB, 30 %, 1 %
+-n 2, -gpu_id 3
+Performance: 33.806 ns/day, 0.710 hours/ns, 391.277 timesteps/s
+, GeForce GTX 1080 Ti, 57, 358 MiB, 10814 MiB, 47 %, 3 %
+-n 4, -gpu_id 3
+Performance: 48.504 ns/day, 0.495 hours/ns, 561.388 timesteps/s (x4 = 194 ns/day/node)
+, GeForce GTX 1080 Ti, 59, 690 MiB, 10482 MiB, 76 %, 4 %
+-n 8, -gpu_id 3
+Performance: 37.742 ns/day, 0.636 hours/ns, 436.833 timesteps/s
+, GeForce GTX 1080 Ti, 47, 1332 MiB, 9840 MiB, 90 %, 4 %
+-n 4, -gpu_id 01
+Performance: 57.621 ns/day, 0.417 hours/ns, 666.912 timesteps/s
+, GeForce GTX 1080 Ti, 48, 350 MiB, 10822 MiB, 50 %, 3 %
+, GeForce GTX 1080 Ti, 37, 344 MiB, 10828 MiB, 49 %, 3 %
+-n 8, -gpu_id 01
+Performance: 63.625 ns/day, 0.377 hours/ns, 736.400 timesteps/s (x2 = 127 ns/day/node)
+, GeForce GTX 1080 Ti, 66, 670 MiB, 10502 MiB, 77 %, 4 %
+, GeForce GTX 1080 Ti, 51, 670 MiB, 10502 MiB, 81 %, 4 %
+-n 12, -gpu_id 01
+Performance: 61.198 ns/day, 0.392 hours/ns, 708.315 timesteps/s
+, GeForce GTX 1080 Ti, 65, 988 MiB, 10184 MiB, 82 %, 4 %
+, GeForce GTX 1080 Ti, 50, 990 MiB, 10182 MiB, 85 %, 4 %
+-n 8, -gpu_id 0123
+Performance: 86.273 ns/day, 0.278 hours/ns, 998.534 timesteps/s
+, GeForce GTX 1080 Ti, 56, 340 MiB, 10832 MiB, 57 %, 3 %
+, GeForce GTX 1080 Ti, 41, 340 MiB, 10832 MiB, 52 %, 2 %
+, GeForce GTX 1080 Ti, 43, 340 MiB, 10832 MiB, 57 %, 3 %
+, GeForce GTX 1080 Ti, 42, 340 MiB, 10832 MiB, 55 %, 2 %
+-n 12, -gpuid 0123
+Performance: 108.905 ns/day, 0.220 hours/ns, 1260.478 timesteps/s (x1 = 109 ns/day/node)
+-n 16
+Performance: 88.989 ns/day, 0.270 hours/ns, 1029.964 timesteps/s
+# on n34
+unable to get it to run...
+K20 on n34
+-n 1, -gpu_id 0
+-n 4, -gpu_id 0
+-n 4, -gpuid 0123
+# comparison of binaries running PMMA
+# 1 gpu 4 mpi threads each run
+# lmp_mpi-double-double-with-gpu.log
+Performance: 49.833 ns/day, 0.482 hours/ns, 576.769 timesteps/s
+# lmp_mpi-single-double-with-gpu.log
+Performance: 58.484 ns/day, 0.410 hours/ns, 676.899 timesteps/s
+# lmp_mpi-single-single-with-gpu.log
+Performance: 56.660 ns/day, 0.424 hours/ns, 655.793 timesteps/s
+</code>
+==== FSL ====
+**User Time Reported** from time command
+  * mwgpu cpu run
+  * 2013 model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
+    * All tests 45m
+    * Bft test 16m28s (bedpostx)
+  * amber128 cpu run
+  * 2017 model name      : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
+    * All tests 17m - 2.5x faster
+    * Bft test 3m39s - 6x faster (bedpostx)
+  * amber128 gpu run
+  * 2017 CUDA Device Name: GeForce GTX 1080 Ti
+    * Bft gpu test 0m1.881s (what!? from command line) - 116x faster (bedpostx_gpu)
+    * Bft gpu test 0m1.850s (what!? via scheduler) - 118x faster (bedpostx_gpu)
+==== FreeSurfer ====
+  * http://freesurfer.net/fswiki/DownloadAndInstall#TestyourFreeSurferInstallation
+  * Example using sample-001.mgz
+<code>
+Node n37 (mwgpu cpu run)
+(2013) Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
+recon-all -s bert finished without error
+example 1 user    0m3.516s
+example 2 user    893m1.761s ~15 hours
+example 3 user    ???m       ~15 hours (estimated)
+Node n78 (amber128 cpu run)
+(2017) Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
+recon-all -s bert finished without error
+example 1 user    0m2.315s
+example 2 user    488m49.215s ~8 hours
+example 3 user    478m44.622s ~8 hours
+freeview -v \
+    bert/mri/T1.mgz \
+    bert/mri/wm.mgz \
+    bert/mri/brainmask.mgz \
+    bert/mri/aseg.mgz:colormap=lut:opacity=0.2 \
+    -f \
+    bert/surf/lh.white:edgecolor=blue \
+    bert/surf/lh.pial:edgecolor=red \
+    bert/surf/rh.white:edgecolor=blue \
+    bert/surf/rh.pial:edgecolor=red
+</code>
+Development code for the GPU http://surfer.nmr.mgh.harvard.edu/fswiki/freesurfer_linux_developers_page
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools