User Tools

Site Tools


cluster:175

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:175 [2018/09/22 12:46]
hmeij07 [Lammps]
cluster:175 [2018/09/22 12:51]
hmeij07 [Gromacs]
Line 33: Line 33:
 ==== Lammps ==== ==== Lammps ====
  
-We can also not complain about gpu utilization in this example.  We tend to achieve better performance with cpu:gpu ratios in the 4:1 range on our GTX server but not on this cluster.  Best performance was obtained when cpu threads equaled the number of gpus. +We can also not complain about gpu utilization in this example.  We tend to achieve better performance with cpu:gpu ratios in the 4:1 range on our GTX server but not on this cluster.  Best performance was obtained when cpu threads equaled the number of gpus used
  
 On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ratio of cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day. On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ratio of cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day.
  
-A single P100 beat this easily coming in at 2.6 million tau/day. Spreading the problem over more gpus did raise overall performance to 3.3 million tau/day. However, four cpu:gpu 1:1 jobs would achieve slightly over 10 million tau/day. That is almost 10x faster than the GTX server.+A single P100 gpu beats this easily coming in at 2.6 million tau/day. Spreading the problem over more gpus did raise overall performance to 3.3 million tau/day. However, four cpu:gpu 1:1 jobs would achieve slightly over 10 million tau/day. That is almost 10x faster than our GTX server.
  
 <code> <code>
Line 66: Line 66:
 Gromacs has shown vastly improved performance between versions. v5 delivered about 20 ns/day per K20 server and 350 ns/day on GTX server. v2018 delivered 75 ns/day per K20 server and 900 ns/day on GTX server. A roughly 3x improvement. Gromacs has shown vastly improved performance between versions. v5 delivered about 20 ns/day per K20 server and 350 ns/day on GTX server. v2018 delivered 75 ns/day per K20 server and 900 ns/day on GTX server. A roughly 3x improvement.
  
-On the P100 I could not invoke the multidir option of gromacs (have run it on GTX, weird). The utilization of the gpu drops as more and more gpus are deployed.  The optimum performance was with dual gpus achieving 36 ns/day. Four one gpu jobs would deliver 136 ns/day/server, far short of the 900 ns/day for GTX server (we only have dual P100 nodes quoted).+On the P100 test node, I could not invoke the multidir option of gromacs (have run it on GTX, weird). The utilization of the gpu drops as more and more gpus are deployed.  The optimum performance was with dual gpus achieving 36 ns/day. Four one gpu jobs would deliver 136 ns/day/server, far short of the 900 ns/day for our GTX server(We only have dual P100 nodes quoted).
  
 <code> <code>
cluster/175.txt ยท Last modified: 2018/11/29 18:00 by hmeij07