User Tools

Site Tools


cluster:175

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
cluster:175 [2018/09/21 14:06]
hmeij07 [GTX vs P100 vs K20]
cluster:175 [2018/09/21 14:29]
hmeij07 [Amber]
Line 28: Line 28:
 2, Tesla P100-PCIE-16GB, 44, 327 MiB, 15953 MiB, 100 %, 0 % 2, Tesla P100-PCIE-16GB, 44, 327 MiB, 15953 MiB, 100 %, 0 %
 3, Tesla P100-PCIE-16GB, 43, 327 MiB, 15953 MiB, 100 %, 0 % 3, Tesla P100-PCIE-16GB, 43, 327 MiB, 15953 MiB, 100 %, 0 %
 +
 +</code>
 +
 +==== Lammps ====
 +
 +We can also not complain about gpu utilization in this example.  We tend to achieve better performance with cpu:gpu ratios in the 4:1 range but not this time.  Best performance was obtained when cpu equaled gpu. 
 +
 +On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ration cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day.
 +  
 +<code>
 +
 +mpirun --oversubscribe -x LD_LIBRARY_PATH -np 8 \
 +-H localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost \
 +lmp_mpi-double-double-with-gpu -suffix gpu -pk gpu 4 \
 +-in in.colloid > out.1 
 +
 +gpu=1 mpi=1
 +Performance: 2684821.234 tau/day, 6214.864 timesteps/s
 +gpu=2 mpi=2
 +Performance: 3202640.823 tau/day, 7413.520 timesteps/s
 +gpu=4 mpi=4
 +Performance: 3341009.801 tau/day, 7733.819 timesteps/s
 +any mpi>gpu yielded degraded performance.
 +
 +index, name, temp.gpu, mem.used [MiB], mem.free [MiB], util.gpu [%], util.mem [%]
 +0, Tesla P100-PCIE-16GB, 35, 596 MiB, 15684 MiB, 82 %, 2 %
 +1, Tesla P100-PCIE-16GB, 38, 596 MiB, 15684 MiB, 77 %, 2 %
 +2, Tesla P100-PCIE-16GB, 37, 596 MiB, 15684 MiB, 81 %, 2 %
 +3, Tesla P100-PCIE-16GB, 37, 596 MiB, 15684 MiB, 80 %, 2 %
 +
  
 </code> </code>
cluster/175.txt ยท Last modified: 2018/11/29 18:00 by hmeij07