User Tools

Site Tools


cluster:164

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:164 [2017/10/27 19:30]
hmeij07
cluster:164 [2018/09/21 11:59] (current)
hmeij07
Line 106: Line 106:
 nvidia-smi -pm 0; nvidia-smi -c 0 nvidia-smi -pm 0; nvidia-smi -c 0
 # gpu_id is done via CUDA_VISIBLE_DEVICES # gpu_id is done via CUDA_VISIBLE_DEVICES
-export CUDA_VISIBLE_DEVCES=$STRING_2+export CUDA_VISIBLE_DEVICES=$STRING_2
 # on n78 # on n78
 /usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f /home/hmeij/amber/nucleosome/hostfile \ /usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f /home/hmeij/amber/nucleosome/hostfile \
Line 195: Line 195:
 Mapping of GPU IDs to the 16 PP ranks in this node: 0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3 Mapping of GPU IDs to the 16 PP ranks in this node: 0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3
 Performance:       19.814        1.211 (x16 = 317.024) Performance:       19.814        1.211 (x16 = 317.024)
 +
 +# UPDATE Gromacs 2018, check out these new performance stats for -n 4, -gpu=4
 +
 +# K20, redone with cuda 9
 +
 +root@cottontail gpu]# egrep 'ns/day|Performance' 0[0-4]/md.log
 +01/md.log:                 (ns/day)    (hour/ns)
 +01/md.log:Performance:       74.275        0.323
 +02/md.log:                 (ns/day)    (hour/ns)
 +02/md.log:Performance:       74.111        0.324
 +03/md.log:                 (ns/day)    (hour/ns)
 +03/md.log:Performance:       73.965        0.324
 +04/md.log:                 (ns/day)    (hour/ns)
 +04/md.log:Performance:       74.207        0.323
 +
 +# GTX1080 cuda 8
 + 
 +[hmeij@cottontail gpu]$ egrep 'ns/day|Performance' 0[1-4]/md.log
 +01/md.log:                 (ns/day)    (hour/ns)
 +01/md.log:Performance:      229.229        0.105
 +02/md.log:                 (ns/day)    (hour/ns)
 +02/md.log:Performance:      221.936        0.108
 +03/md.log:                 (ns/day)    (hour/ns)
 +03/md.log:Performance:      217.618        0.110
 +04/md.log:                 (ns/day)    (hour/ns)
 +04/md.log:Performance:      228.854        0.105
 +
 +Almost 900 ns/day for a single server.
  
 </code> </code>
Line 564: Line 592:
  
 ==== PPMA Bench ==== ==== PPMA Bench ====
 +
 +  * Runs fastest when constrined to one gpu with 4 mpi threads
 +  * Room for improvement as gpu and gpu memory are not fully utilized
 +  * Adding mpi threads or more gpus reduces ns/day performance
 +  * No idea if adding omp threads shows a different picture
 +  * No idea how it compares to K20 gpus
  
 <code> <code>
  
-PMMA Benchmark Performance Metric (x  nr of gpus)+nvidia-smi -pm 0; nvidia-smi -c 0 
 +# gpu_id is done via CUDA_VISIBLE_DEVICES 
 +export CUDA_VISIBLE_DEVCES=[0,1,2,3]
  
 +# on n78
 +cd /home/hmeij/lammps/benchmark
 +rm -f /tmp/lmp-run.log;rm -f *.jpg;\
 +time /usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f ./hostfile  -n $STRING_1 \
 +/usr/local/lammps-11Aug17/lmp_mpi-double-double-with-gpu -suffix gpu -pk gpu $STRING_2 \
 +-in nvt.in -var t 310 > /dev/null 2>&1; grep ^Performance /tmp/lmp-run.log
  
-GTX on n78+ 
 +PMMA Benchmark Performance Metric ns/day (x  nr of gpus for node output) 
 + 
 + 
 +Lammps 11Aug17 on GTX1080Ti (n78)
  
 -n 1, -gpu_id 3 -n 1, -gpu_id 3
Line 618: Line 664:
 -n 4, -gpuid 0123 -n 4, -gpuid 0123
  
 +# comparison of binaries running PMMA
 +# 1 gpu 4 mpi threads each run
 +
 +# lmp_mpi-double-double-with-gpu.log
 +Performance: 49.833 ns/day, 0.482 hours/ns, 576.769 timesteps/s
 +# lmp_mpi-single-double-with-gpu.log
 +Performance: 58.484 ns/day, 0.410 hours/ns, 676.899 timesteps/s
 +# lmp_mpi-single-single-with-gpu.log
 +Performance: 56.660 ns/day, 0.424 hours/ns, 655.793 timesteps/s
  
 </code> </code>
  
 +==== FSL ====
 +
 +**User Time Reported** from time command
 +
 +  * mwgpu cpu run
 +  * 2013 model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 +    * All tests 45m
 +    * Bft test 16m28s (bedpostx)
 +
 +  * amber128 cpu run
 +  * 2017 model name      : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
 +    * All tests 17m - 2.5x faster
 +    * Bft test 3m39s - 6x faster (bedpostx)
 +
 +  * amber128 gpu run
 +  * 2017 CUDA Device Name: GeForce GTX 1080 Ti
 +    * Bft gpu test 0m1.881s (what!? from command line) - 116x faster (bedpostx_gpu)
 +    * Bft gpu test 0m1.850s (what!? via scheduler) - 118x faster (bedpostx_gpu)
 +
 +
 +==== FreeSurfer ====
 +
 +
 +  * http://freesurfer.net/fswiki/DownloadAndInstall#TestyourFreeSurferInstallation
 +  * Example using sample-001.mgz
 +
 +<code>
 +
 +Node n37 (mwgpu cpu run)
 +(2013) Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 +recon-all -s bert finished without error
 +example 1 user    0m3.516s
 +example 2 user    893m1.761s ~15 hours
 +example 3 user    ???m       ~15 hours (estimated)
 +
 +Node n78 (amber128 cpu run)
 +(2017) Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
 +recon-all -s bert finished without error
 +example 1 user    0m2.315s
 +example 2 user    488m49.215s ~8 hours
 +example 3 user    478m44.622s ~8 hours
 +
 +
 +freeview -v \
 +    bert/mri/T1.mgz \
 +    bert/mri/wm.mgz \
 +    bert/mri/brainmask.mgz \
 +    bert/mri/aseg.mgz:colormap=lut:opacity=0.2 \
 +    -f \
 +    bert/surf/lh.white:edgecolor=blue \
 +    bert/surf/lh.pial:edgecolor=red \
 +    bert/surf/rh.white:edgecolor=blue \
 +    bert/surf/rh.pial:edgecolor=red
 +
 +
 +</code>
 +
 +Development code for the GPU http://surfer.nmr.mgh.harvard.edu/fswiki/freesurfer_linux_developers_page
 + 
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/164.1509132629.txt.gz · Last modified: 2017/10/27 19:30 by hmeij07