Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

--- cluster:175 [2018/09/22 14:34]
hmeij07 [Lammps]
+++ cluster:175 [2018/11/29 13:00] (current)
hmeij07
@@ Line 2: / Line 2: @@
 \\
 **[[cluster:0|Back]]**
+As of
+ --- //[[hmeij@wesleyan.edu|Henk]] 2018/10/08 08:56// \\
+The P100 with 12 GB is end of life, replaced by the P100 16 GB or V100 and \\
+The GTX 1080Ti will be replaced by the GTX 2080 (no specs yet and not certified for Amber18, yet)\\
+As of
+ --- //[[hmeij@wesleyan.edu|Henk]] 2018/11/29 12:55//\\
+New GROMACS performance benchmarks featuring 2x and 4x NVIDIA RTX 2080 GPUs are now available (GTX too). The RTX 2080 graphics card utilizes the new NVIDIA Turing GPU architecture and provides up to 6x the performance of the previous generation. (Exxact newsletter)
 ==== P100 vs GTX & K20 ====
@@ Line 9: / Line 18: @@
 |  mem  |  12/16  |  11  |  5  |  gb  |
 |  ghz  |  2.6  |  1.6  |  0.7  |  speed  |
-|  flops  |  4.7  |  0.355  |  1.15  |  dpfp  |
+|  flops  |  4.7/5.3  |  0.355  |  1.15  |  dpfp  |
 Comparing these GPUs yields the following results presented below. These are not "benchmark suites" so your mileage may vary. It will give us some comparative information for decision making on our 2018 GPU Expansion Project.  The GTX & K20 data comes from this page [[cluster:164|GTX 1080 Ti]]
@@ Line 36: / Line 45: @@
 </code>
+Look at these gpu temperatures, that's Celsius.
 ==== Lammps ====
@@ Line 68: / Line 79: @@
 </code>
-Look at these gpu temperatures, that's Celsius.
+==== Lammps (PMMA) ====
+Using a material called PMMA (https://en.wikipedia.org/wiki/Poly(methyl_methacrylate) aka acrylic glass or plexiglas ("safety glass"). The PMMA simulations require the calculation of molecular bonds, which is not implemented in GPU hence more CPU cores are required than the Coillod example. The optimal ratio cpu:gpu appears to be 4-6:1.
+^  gpu  ^  cpus  ^  ns/day  ^  quad  ^  ns/day/node  ^
+|  1 P100  |  4  |  89  |  x4  |  356  |
+|  1 GTX  |  6  |  90  |  x4  |  360  |
+|  1 K20  |  6  |  47  |  x4  |  188  |
+That means the P100 works as well as the GTX. The K20 works at 50% the performance level of the others which is impressive for this old gpu.
 ==== Gromacs ====
@@ Line 74: / Line 97: @@
 Gromacs has shown vastly improved performance between versions. v5 delivered about 20 ns/day per K20 server and 350 ns/day on GTX server. v2018 delivered 75 ns/day per K20 server and 900 ns/day on GTX server. A roughly 3x improvement.
-On the P100 test node, I could not invoke the multidir option of gromacs (have run it on GTX, weird). The utilization of the gpu drops as more and more gpus are deployed.  The optimum performance was with dual gpus achieving 36 ns/day. Four one gpu jobs would deliver 136 ns/day/server, far short of the 900 ns/day for our GTX server. (We only have dual P100 nodes quoted).
+On the P100 test node, I could not invoke the multidir option of gromacs (have run it on GTX, weird). The utilization of the gpu drops as more and more gpus are deployed.  The optimum performance was with dual gpus achieving 36 ns/day. Four one gpu jobs would deliver 120 ns/day/server, far short of the 900 ns/day for our GTX server. (We only have dual P100 nodes quoted).
 <code>
@@ Line 83: / Line 106: @@
 localhost,localhost,localhost,localhost,localhost,localhost,\
 localhost,localhost,localhost,localhost,localhost,localhost,localhost \
-gmx_mpi mdrun -gpu_id 0123 -ntmpi 0 \
+gmx_mpi mdrun -gpu_id 0123 -ntmpi 0 -nt 0 \
  -s topol.tpr -ntomp 4 -npme 1 -nsteps 20000 -pin on -nb gpu
@@ Line 104: / Line 127: @@
 </code>
+==== Gromacs 2018.3 ====
+The multidir not running in Gromacs 2018 is a bug in the code clashing with the call MPI_Barrier (communication timing error).  It is fixed in Gromacs 2018.3 so we have some multidir results although gpu utilization has room for improvements (about 25% used, same as on GTX).
+<code>
+# multidir -gpu_id 0123 with four simultaneous gromacs processes
+-np  8 -ntomp  4 -npme 1 -maxh 0.1 -pin on -nb gpu
+/md.log:Performance:       36.692        0.654
+/md.log:Performance:       36.650        0.655
+/md.log:Performance:       36.623        0.655
+/md.log:Performance:       36.663        0.655 or about 146 ns/day/node (quad)
+-np 16 -ntomp  8 -npme 1 -maxh 0.1 -pin on -nb gpu
+/md.log:Performance:       25.151        0.954
+/md.log:Performance:       25.257        0.950
+/md.log:Performance:       25.247        0.951
+/md.log:Performance:       25.345        0.947 or about 100 ns/day/node (quad)
+multidir -gpu_id 00112233 with eight simultaneous gromacs processes
+sharing the gpus, 2 processes per gpu
+-np  8 -ntomp  4 -npme 1 -maxh 0.1 -pin on -nb gpu
+Error in user input:
+The string of available GPU device IDs '00112233' may not contain duplicate
+device IDs
+</code>
+That last error when loading multiple processes per gpu is *not* according to their documentation. So the multidir performance is similar to previous single dir performance but still lags GTX performance by quite a bit. Albeit, there is room in utilization rate of the gpus.
 ==== What to Buy ====
   * Amber folks: does not matter
-  * Lammps folks: P100 nodes please
+  * Lammps folks: P100 nodes please, although in hybrid runs does not matter
   * Gromacs folks: GTX nodes please

DokuWiki

User Tools

Site Tools

Differences

Page Tools