User Tools

Site Tools


cluster:164

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:164 [2017/10/26 14:26]
hmeij07
cluster:164 [2018/09/21 07:59] (current)
hmeij07
Line 106: Line 106:
 nvidia-smi -pm 0; nvidia-smi -c 0 nvidia-smi -pm 0; nvidia-smi -c 0
 # gpu_id is done via CUDA_VISIBLE_DEVICES # gpu_id is done via CUDA_VISIBLE_DEVICES
-export CUDA_VISIBLE_DEVCES=$STRING_2+export CUDA_VISIBLE_DEVICES=$STRING_2
 # on n78 # on n78
 /usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f /home/hmeij/amber/nucleosome/hostfile \ /usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f /home/hmeij/amber/nucleosome/hostfile \
Line 195: Line 195:
 Mapping of GPU IDs to the 16 PP ranks in this node: 0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3 Mapping of GPU IDs to the 16 PP ranks in this node: 0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3
 Performance:       19.814        1.211 (x16 = 317.024) Performance:       19.814        1.211 (x16 = 317.024)
 +
 +# UPDATE Gromacs 2018, check out these new performance stats for -n 4, -gpu=4
 +
 +# K20, redone with cuda 9
 +
 +root@cottontail gpu]# egrep 'ns/day|Performance' 0[0-4]/md.log
 +01/md.log:                 (ns/day)    (hour/ns)
 +01/md.log:Performance:       74.275        0.323
 +02/md.log:                 (ns/day)    (hour/ns)
 +02/md.log:Performance:       74.111        0.324
 +03/md.log:                 (ns/day)    (hour/ns)
 +03/md.log:Performance:       73.965        0.324
 +04/md.log:                 (ns/day)    (hour/ns)
 +04/md.log:Performance:       74.207        0.323
 +
 +# GTX1080 cuda 8
 + 
 +[hmeij@cottontail gpu]$ egrep 'ns/day|Performance' 0[1-4]/md.log
 +01/md.log:                 (ns/day)    (hour/ns)
 +01/md.log:Performance:      229.229        0.105
 +02/md.log:                 (ns/day)    (hour/ns)
 +02/md.log:Performance:      221.936        0.108
 +03/md.log:                 (ns/day)    (hour/ns)
 +03/md.log:Performance:      217.618        0.110
 +04/md.log:                 (ns/day)    (hour/ns)
 +04/md.log:Performance:      228.854        0.105
 +
 +Almost 900 ns/day for a single server.
  
 </code> </code>
Line 563: Line 591:
 </code> </code>
  
 +==== PPMA Bench ====
 +
 +  * Runs fastest when constrined to one gpu with 4 mpi threads
 +  * Room for improvement as gpu and gpu memory are not fully utilized
 +  * Adding mpi threads or more gpus reduces ns/day performance
 +  * No idea if adding omp threads shows a different picture
 +  * No idea how it compares to K20 gpus
 +
 +<code>
 +
 +nvidia-smi -pm 0; nvidia-smi -c 0
 +# gpu_id is done via CUDA_VISIBLE_DEVICES
 +export CUDA_VISIBLE_DEVCES=[0,1,2,3]
 +
 +# on n78
 +cd /home/hmeij/lammps/benchmark
 +rm -f /tmp/lmp-run.log;rm -f *.jpg;\
 +time /usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f ./hostfile  -n $STRING_1 \
 +/usr/local/lammps-11Aug17/lmp_mpi-double-double-with-gpu -suffix gpu -pk gpu $STRING_2 \
 +-in nvt.in -var t 310 > /dev/null 2>&1; grep ^Performance /tmp/lmp-run.log
 +
 +
 +PMMA Benchmark Performance Metric ns/day (x  nr of gpus for node output)
 +
 +
 +Lammps 11Aug17 on GTX1080Ti (n78)
 +
 +-n 1, -gpu_id 3
 +Performance: 19.974 ns/day, 1.202 hours/ns, 231.176 timesteps/s
 +3, GeForce GTX 1080 Ti, 38, 219 MiB, 10953 MiB, 30 %, 1 %                                                      
 +-n 2, -gpu_id 3
 +Performance: 33.806 ns/day, 0.710 hours/ns, 391.277 timesteps/s
 +3, GeForce GTX 1080 Ti, 57, 358 MiB, 10814 MiB, 47 %, 3 %
 +-n 4, -gpu_id 3
 +Performance: 48.504 ns/day, 0.495 hours/ns, 561.388 timesteps/s (x4 = 194 ns/day/node)
 +3, GeForce GTX 1080 Ti, 59, 690 MiB, 10482 MiB, 76 %, 4 %
 +-n 8, -gpu_id 3
 +Performance: 37.742 ns/day, 0.636 hours/ns, 436.833 timesteps/s
 +3, GeForce GTX 1080 Ti, 47, 1332 MiB, 9840 MiB, 90 %, 4 %
 +-n 4, -gpu_id 01
 +Performance: 57.621 ns/day, 0.417 hours/ns, 666.912 timesteps/
 +0, GeForce GTX 1080 Ti, 48, 350 MiB, 10822 MiB, 50 %, 3 %
 +1, GeForce GTX 1080 Ti, 37, 344 MiB, 10828 MiB, 49 %, 3 %
 +-n 8, -gpu_id 01
 +Performance: 63.625 ns/day, 0.377 hours/ns, 736.400 timesteps/s (x2 = 127 ns/day/node)
 +0, GeForce GTX 1080 Ti, 66, 670 MiB, 10502 MiB, 77 %, 4 %
 +1, GeForce GTX 1080 Ti, 51, 670 MiB, 10502 MiB, 81 %, 4 %
 +-n 12, -gpu_id 01
 +Performance: 61.198 ns/day, 0.392 hours/ns, 708.315 timesteps/s
 +0, GeForce GTX 1080 Ti, 65, 988 MiB, 10184 MiB, 82 %, 4 %
 +1, GeForce GTX 1080 Ti, 50, 990 MiB, 10182 MiB, 85 %, 4 %
 +-n 8, -gpu_id 0123
 +Performance: 86.273 ns/day, 0.278 hours/ns, 998.534 timesteps/
 +0, GeForce GTX 1080 Ti, 56, 340 MiB, 10832 MiB, 57 %, 3 %
 +1, GeForce GTX 1080 Ti, 41, 340 MiB, 10832 MiB, 52 %, 2 %
 +2, GeForce GTX 1080 Ti, 43, 340 MiB, 10832 MiB, 57 %, 3 %
 +3, GeForce GTX 1080 Ti, 42, 340 MiB, 10832 MiB, 55 %, 2 %
 +-n 12, -gpuid 0123
 +Performance: 108.905 ns/day, 0.220 hours/ns, 1260.478 timesteps/s (x1 = 109 ns/day/node)
 +-n 16
 +Performance: 88.989 ns/day, 0.270 hours/ns, 1029.964 timesteps/s
 +
 +
 +
 +# on n34
 +unable to get it to run...
 +
 +K20 on n34 
 +
 +-n 1, -gpu_id 0
 +-n 4, -gpu_id 0
 +-n 4, -gpuid 0123
 +
 +# comparison of binaries running PMMA
 +# 1 gpu 4 mpi threads each run
 +
 +# lmp_mpi-double-double-with-gpu.log
 +Performance: 49.833 ns/day, 0.482 hours/ns, 576.769 timesteps/s
 +# lmp_mpi-single-double-with-gpu.log
 +Performance: 58.484 ns/day, 0.410 hours/ns, 676.899 timesteps/s
 +# lmp_mpi-single-single-with-gpu.log
 +Performance: 56.660 ns/day, 0.424 hours/ns, 655.793 timesteps/s
 +
 +</code>
 +
 +==== FSL ====
 +
 +**User Time Reported** from time command
 +
 +  * mwgpu cpu run
 +  * 2013 model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 +    * All tests 45m
 +    * Bft test 16m28s (bedpostx)
 +
 +  * amber128 cpu run
 +  * 2017 model name      : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
 +    * All tests 17m - 2.5x faster
 +    * Bft test 3m39s - 6x faster (bedpostx)
 +
 +  * amber128 gpu run
 +  * 2017 CUDA Device Name: GeForce GTX 1080 Ti
 +    * Bft gpu test 0m1.881s (what!? from command line) - 116x faster (bedpostx_gpu)
 +    * Bft gpu test 0m1.850s (what!? via scheduler) - 118x faster (bedpostx_gpu)
 +
 +
 +==== FreeSurfer ====
 +
 +
 +  * http://freesurfer.net/fswiki/DownloadAndInstall#TestyourFreeSurferInstallation
 +  * Example using sample-001.mgz
 +
 +<code>
 +
 +Node n37 (mwgpu cpu run)
 +(2013) Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 +recon-all -s bert finished without error
 +example 1 user    0m3.516s
 +example 2 user    893m1.761s ~15 hours
 +example 3 user    ???m       ~15 hours (estimated)
 +
 +Node n78 (amber128 cpu run)
 +(2017) Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
 +recon-all -s bert finished without error
 +example 1 user    0m2.315s
 +example 2 user    488m49.215s ~8 hours
 +example 3 user    478m44.622s ~8 hours
 +
 +
 +freeview -v \
 +    bert/mri/T1.mgz \
 +    bert/mri/wm.mgz \
 +    bert/mri/brainmask.mgz \
 +    bert/mri/aseg.mgz:colormap=lut:opacity=0.2 \
 +    -f \
 +    bert/surf/lh.white:edgecolor=blue \
 +    bert/surf/lh.pial:edgecolor=red \
 +    bert/surf/rh.white:edgecolor=blue \
 +    bert/surf/rh.pial:edgecolor=red
 +
 +
 +</code>
 +
 +Development code for the GPU http://surfer.nmr.mgh.harvard.edu/fswiki/freesurfer_linux_developers_page
 + 
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/164.1509042409.txt.gz ยท Last modified: 2017/10/26 14:26 by hmeij07