User Tools

Site Tools


cluster:164

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:164 [2017/10/26 14:17]
hmeij07 [Bench]
cluster:164 [2017/10/26 18:25]
hmeij07 [Bench]
Line 94: Line 94:
 ==== Bench ==== ==== Bench ====
  
-  * Amber 16. My sample script runs 3-4x faster than on a K20 +  * Amber 16. Nucleosome bench runs 4.5x faster than on a K20 
-    * Do not have enough expertise to assess thisneed stats from Kelly+    * Not sure it is representative of our work load 
 +    * Adding more MPI threads decreases performance 
 +    * Running across more gpus (2 or 4) decreases performance 
 +    * One Amber process per MPI thread per GPU is optimal 
 + 
 +**Wow, I just realized the most important metric: Our k20 has a job throughput of 20 per unit of time. The amber128 queue will have a throughput of 4*4.5 or 18 per same unit of time. One new server matches five old oneswell purchased in 2013.** 
 + 
 +<code> 
 + 
 +nvidia-smi -pm 0; nvidia-smi -c 0 
 +# gpu_id is done via CUDA_VISIBLE_DEVICES 
 +export CUDA_VISIBLE_DEVCES=$STRING_2 
 +# on n78 
 +/usr/local/mpich-3.1.4/bin/mpirun -launcher ssh -f /home/hmeij/amber/nucleosome/hostfile \ 
 +-n $STRING_1 $AMBERHOME/bin/pmemd.cuda.MPI -O -o /tmp/mdout -i mdin.GPU \ 
 +-p prmtop -c inpcrd -ref inpcrd ; grep 'ns/day' /tmp/mdout 
 +# on n34 
 +/cm/shared/apps/mvapich2/gcc/64/1.6/bin/mpirun_rsh -ssh -hostfile /home/hmeij/amber/nucleosome/hostfile2 \ 
 +-np $STRING_1  pmemd.cuda.MPI -O -o /tmp/mdout -i mdin.GPU -p prmtop -c inpcrd -ref inpcrd; grep 'ns/day' /tmp/mdout 
 + 
 + 
 +Nucleosome Metric ns/day, seconds/ns  across all steps  x  nr of gpus 
 + 
 + 
 +GTX on n78 
 + 
 +-n 1, -gpu_id 0 
 +|         ns/day =      12.24   seconds/ns =    7058.94   x4 = 48.96  (4.5 faster than k20) 
 +-n 2, -gpu_id 0 
 +|         ns/day =      11.50   seconds/ns =    7509.97 
 +-n 4, -gpu_id 0 
 +|         ns/day =      10.54   seconds/ns =    8197.80 
 +-n 4, -gpu_id 01 
 +|         ns/day =      20.70   seconds/ns =    4173.55   x2 = 41.40 
 +-n 8, -gpu_id 01 
 +|         ns/day =      17.44   seconds/ns =    4953.04 
 +-n 4, -gpu_id 0123 
 +|         ns/day =      32.90   seconds/ns =    2626.27   x1 
 +-n 8, -gpu_id 0123 
 +|         ns/day =      28.43   seconds/ns =    3038.72   x1 
 + 
 + 
 +K20 on n34  
 + 
 +-n 1, -gpu_id 0 
 +|             ns/day =       2.71   seconds/ns =   31883.03 
 +-n 4, -gpu_id 0 
 +|             ns/day =       1.53   seconds/ns =   56325.00 
 +-n4, -gpuid 0123 
 +|             ns/day =       5.87   seconds/ns =   14730.45 
 + 
 + 
 + 
 +</code>
  
   * Gromacs 5.1.4 My (Colin's) multidir bench runs about 2x faster than on a K20   * Gromacs 5.1.4 My (Colin's) multidir bench runs about 2x faster than on a K20
cluster/164.txt · Last modified: 2018/09/21 11:59 by hmeij07