Differences

This shows you the differences between two versions of the page.

--- cluster:175 [2018/09/21 14:06]
hmeij07 [GTX vs P100 vs K20]
+++ cluster:175 [2018/09/21 14:29]
hmeij07 [Amber]
@@ Line 28: / Line 28: @@
 , Tesla P100-PCIE-16GB, 44, 327 MiB, 15953 MiB, 100 %, 0 %
 , Tesla P100-PCIE-16GB, 43, 327 MiB, 15953 MiB, 100 %, 0 %
+</code>
+==== Lammps ====
+We can also not complain about gpu utilization in this example.  We tend to achieve better performance with cpu:gpu ratios in the 4:1 range but not this time.  Best performance was obtained when cpu equaled gpu.
+On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ration cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day.
+<code>
+mpirun --oversubscribe -x LD_LIBRARY_PATH -np 8 \
+-H localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost \
+lmp_mpi-double-double-with-gpu -suffix gpu -pk gpu 4 \
+-in in.colloid > out.1
+gpu=1 mpi=1
+Performance: 2684821.234 tau/day, 6214.864 timesteps/s
+gpu=2 mpi=2
+Performance: 3202640.823 tau/day, 7413.520 timesteps/s
+gpu=4 mpi=4
+Performance: 3341009.801 tau/day, 7733.819 timesteps/s
+any mpi>gpu yielded degraded performance.
+index, name, temp.gpu, mem.used [MiB], mem.free [MiB], util.gpu [%], util.mem [%]
+, Tesla P100-PCIE-16GB, 35, 596 MiB, 15684 MiB, 82 %, 2 %
+, Tesla P100-PCIE-16GB, 38, 596 MiB, 15684 MiB, 77 %, 2 %
+, Tesla P100-PCIE-16GB, 37, 596 MiB, 15684 MiB, 81 %, 2 %
+, Tesla P100-PCIE-16GB, 37, 596 MiB, 15684 MiB, 80 %, 2 %
 </code>

DokuWiki

User Tools

Site Tools

Differences

Page Tools