This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
cluster:175 [2018/09/21 14:06] hmeij07 [GTX vs P100 vs K20] |
cluster:175 [2018/09/21 14:29] hmeij07 [Amber] |
||
---|---|---|---|
Line 28: | Line 28: | ||
2, Tesla P100-PCIE-16GB, | 2, Tesla P100-PCIE-16GB, | ||
3, Tesla P100-PCIE-16GB, | 3, Tesla P100-PCIE-16GB, | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Lammps ==== | ||
+ | |||
+ | We can also not complain about gpu utilization in this example. | ||
+ | |||
+ | On our GTX server best performance was a ratio of 16:4 cpu:gpu for 932,493 tau/day (11x faster than our K20). However scaling the job to a ration cpu:gpu of 4:2 yields 819,207 tau/day which means a quad server can deliver about 1.6 million tau/day. | ||
+ | | ||
+ | < | ||
+ | |||
+ | mpirun --oversubscribe -x LD_LIBRARY_PATH -np 8 \ | ||
+ | -H localhost, | ||
+ | lmp_mpi-double-double-with-gpu -suffix gpu -pk gpu 4 \ | ||
+ | -in in.colloid > out.1 | ||
+ | |||
+ | gpu=1 mpi=1 | ||
+ | Performance: | ||
+ | gpu=2 mpi=2 | ||
+ | Performance: | ||
+ | gpu=4 mpi=4 | ||
+ | Performance: | ||
+ | any mpi>gpu yielded degraded performance. | ||
+ | |||
+ | index, name, temp.gpu, mem.used [MiB], mem.free [MiB], util.gpu [%], util.mem [%] | ||
+ | 0, Tesla P100-PCIE-16GB, | ||
+ | 1, Tesla P100-PCIE-16GB, | ||
+ | 2, Tesla P100-PCIE-16GB, | ||
+ | 3, Tesla P100-PCIE-16GB, | ||
+ | |||
</ | </ |