User Tools

Site Tools


cluster:175

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:175 [2018/09/21 19:13]
hmeij07 [Gromacs]
cluster:175 [2018/09/22 12:47]
hmeij07 [Lammps]
Line 3: Line 3:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-==== GTX vs P100 vs K20 ====+==== GTX vs P100 K20 ====
  
-Comparing these GPUs yields the following data. These are not "benchmark suites" so your mileage may vary. It will give us some comparative information for decision making on our 2018 GPU Expansion Project.  The GTX data comes from this page [[cluster:168|2018 GPU Expansion]]+Comparing these GPUs yields the following data. These are not "benchmark suites" so your mileage may vary. It will give us some comparative information for decision making on our 2018 GPU Expansion Project.  The GTX & K20 data comes from this page [[cluster:164|GTX 1080 Ti]]
  
 Credits: This work was made possible, in part, through HPC time donated by Microway, Inc. We gratefully acknowledge Microway for providing access to their GPU-accelerated compute cluster. Credits: This work was made possible, in part, through HPC time donated by Microway, Inc. We gratefully acknowledge Microway for providing access to their GPU-accelerated compute cluster.
Line 12: Line 12:
 ==== Amber ==== ==== Amber ====
  
-Amber16 continues to run best when one MPI process launches the GPU counterpart.  One can not complain about utilization rates. So a dual P100 server delivers 24 ns/day and a quad P100 server delivers near 48 ns/day. Our quad GTX1080 server delivers 48.96 ns/day (4.5x faster than K20).+Amber16 continues to run best when one MPI process launches the GPU counterpart.  One can not complain about utilization rates. So a dual P100 server delivers 24 ns/day and a quad P100 server delivers near 48 ns/day. Our quad GTX1080 server delivers 48.96 ns/day (4.5x faster than K20). We have dual P100 nodes quoted.
  
 <code> <code>
Line 20: Line 20:
  
 gpu=1 mpi=1 11.94 ns/day gpu=1 mpi=1 11.94 ns/day
-any mpi>performace goes down...+any mpi>and performance goes down...
  
 [heme@login1 amber]$ ssh node6 ~/p100-info [heme@login1 amber]$ ssh node6 ~/p100-info
Line 33: Line 33:
 ==== Lammps ==== ==== Lammps ====
  
-We can also not complain about gpu utilization in this example.  We tend to achieve better performance with cpu:gpu ratios in the 4:1 range On the GTX server but not this time.  Best performance was obtained when cpu threads equaled the number of gpus. +We can also not complain about gpu utilization in this example.  We tend to achieve better performance with cpu:gpu ratios in the 4:1 range on our GTX server but not on this cluster.  Best performance was obtained when cpu threads equaled the number of gpus used
  
 On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ratio of cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day. On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ratio of cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day.
  
-A single P100 beat this easily coming in at 2.6 million tau/day. Spreading the problem over more gpus did raise overall performance to 3.3 million tau/day. However, four cpu:gpu 1:1 jobs would achieve slightly over 10 million tau/day. That is almost 10x faster than the GTX server.+A single P100 gpu beats this easily coming in at 2.6 million tau/day. Spreading the problem over more gpus did raise overall performance to 3.3 million tau/day. However, four cpu:gpu 1:1 jobs would achieve slightly over 10 million tau/day. That is almost 10x faster than the GTX server.
  
 <code> <code>
Line 52: Line 52:
 gpu=4 mpi=4 gpu=4 mpi=4
 Performance: 3341009.801 tau/day, 7733.819 timesteps/s Performance: 3341009.801 tau/day, 7733.819 timesteps/s
-any mpi>gpu yielded degraded performance.+any mpi>gpu yielded degraded performance...
  
 index, name, temp.gpu, mem.used [MiB], mem.free [MiB], util.gpu [%], util.mem [%] index, name, temp.gpu, mem.used [MiB], mem.free [MiB], util.gpu [%], util.mem [%]
cluster/175.txt · Last modified: 2018/11/29 18:00 by hmeij07