DokuWiki

This is an old revision of the document!

GTX vs P100 vs K20

Comparing these GPUs yields the following data. These are not “benchmark suites” so your mileage may vary. It will give us some comparative information for decision making on our 2018 GPU Expansion Project. The GTX data comes from this page 2018 GPU Expansion

Credits: This work was made possible, in part, through HPC time donated by Microway, Inc. We gratefully acknowledge Microway for providing access to their GPU-accelerated compute cluster. Microway

Amber

Amber16 continues to run best when one MPI process launches the GPU counterpart. One can not complain about utilization rates. So a dual P100 server delivers 24 ns/day and a quad P100 server delivers near 48 ns/day. Our quad GTX1080 server delivers 48.96 ns/day (4.5x faster than K20).

mpirun -x LD_LIBRARY_PATH -np 1 -H localhost pmemd.cuda.MPI \
 -O -o mdout.0 -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10.0 -ref inpcrd

gpu=1 mpi=1 11.94 ns/day
any mpi>1 performace goes down...

[heme@login1 amber]$ ssh node6 ~/p100-info
index, name, temp.gpu, mem.used [MiB], mem.free [MiB], util.gpu [%], util.mem [%]
0, Tesla P100-PCIE-16GB, 71, 327 MiB, 15953 MiB, 100 %, 0 %
1, Tesla P100-PCIE-16GB, 49, 327 MiB, 15953 MiB, 100 %, 0 %
2, Tesla P100-PCIE-16GB, 44, 327 MiB, 15953 MiB, 100 %, 0 %
3, Tesla P100-PCIE-16GB, 43, 327 MiB, 15953 MiB, 100 %, 0 %

Lammps

We can also not complain about gpu utilization in this example. We tend to achieve better performance with cpu:gpu ratios in the 4:1 range but not this time. Best performance was obtained when cpu equaled gpu.

On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ration cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day.

mpirun --oversubscribe -x LD_LIBRARY_PATH -np 8 \
-H localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost \
lmp_mpi-double-double-with-gpu -suffix gpu -pk gpu 4 \
-in in.colloid > out.1 

gpu=1 mpi=1
Performance: 2684821.234 tau/day, 6214.864 timesteps/s
gpu=2 mpi=2
Performance: 3202640.823 tau/day, 7413.520 timesteps/s
gpu=4 mpi=4
Performance: 3341009.801 tau/day, 7733.819 timesteps/s
any mpi>gpu yielded degraded performance.

index, name, temp.gpu, mem.used [MiB], mem.free [MiB], util.gpu [%], util.mem [%]
0, Tesla P100-PCIE-16GB, 35, 596 MiB, 15684 MiB, 82 %, 2 %
1, Tesla P100-PCIE-16GB, 38, 596 MiB, 15684 MiB, 77 %, 2 %
2, Tesla P100-PCIE-16GB, 37, 596 MiB, 15684 MiB, 81 %, 2 %
3, Tesla P100-PCIE-16GB, 37, 596 MiB, 15684 MiB, 80 %, 2 %

Back

DokuWiki

User Tools

Site Tools

Table of Contents

GTX vs P100 vs K20

Amber

Lammps

Page Tools