Differences

This shows you the differences between two versions of the page.

--- cluster:175 [2018/09/21 14:05]
hmeij07 created
+++ cluster:175 [2018/09/21 14:29]
hmeij07 [Amber]
@@ Line 3: / Line 3: @@
 **[[cluster:0|Back]]**
-==== GTX vs P100 ====
+==== GTX vs P100 vs K20 ====
-Comparing these two GPUs yields the following data. These are not "benchmark suites" so your mileage may vary. It will give us some comparative information for decision making on our 2018 GPU Expansion Project.  The GTX data comes from this page [[cluster:168|2018 GPU Expansion]]
+Comparing these GPUs yields the following data. These are not "benchmark suites" so your mileage may vary. It will give us some comparative information for decision making on our 2018 GPU Expansion Project.  The GTX data comes from this page [[cluster:168|2018 GPU Expansion]]
 Credits: This work was made possible, in part, through HPC time donated by Microway, Inc. We gratefully acknowledge Microway for providing access to their GPU-accelerated compute cluster.
@@ Line 28: / Line 28: @@
 , Tesla P100-PCIE-16GB, 44, 327 MiB, 15953 MiB, 100 %, 0 %
 , Tesla P100-PCIE-16GB, 43, 327 MiB, 15953 MiB, 100 %, 0 %
+</code>
+==== Lammps ====
+We can also not complain about gpu utilization in this example.  We tend to achieve better performance with cpu:gpu ratios in the 4:1 range but not this time.  Best performance was obtained when cpu equaled gpu.
+On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ration cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day.
+<code>
+mpirun --oversubscribe -x LD_LIBRARY_PATH -np 8 \
+-H localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost \
+lmp_mpi-double-double-with-gpu -suffix gpu -pk gpu 4 \
+-in in.colloid > out.1
+gpu=1 mpi=1
+Performance: 2684821.234 tau/day, 6214.864 timesteps/s
+gpu=2 mpi=2
+Performance: 3202640.823 tau/day, 7413.520 timesteps/s
+gpu=4 mpi=4
+Performance: 3341009.801 tau/day, 7733.819 timesteps/s
+any mpi>gpu yielded degraded performance.
+index, name, temp.gpu, mem.used [MiB], mem.free [MiB], util.gpu [%], util.mem [%]
+, Tesla P100-PCIE-16GB, 35, 596 MiB, 15684 MiB, 82 %, 2 %
+, Tesla P100-PCIE-16GB, 38, 596 MiB, 15684 MiB, 77 %, 2 %
+, Tesla P100-PCIE-16GB, 37, 596 MiB, 15684 MiB, 81 %, 2 %
+, Tesla P100-PCIE-16GB, 37, 596 MiB, 15684 MiB, 80 %, 2 %
 </code>

DokuWiki

User Tools

Site Tools

Differences

Page Tools