This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:182 [2019/08/12 15:00] hmeij07 [Amber] |
cluster:182 [2019/08/12 16:26] hmeij07 [Scripts] |
||
---|---|---|---|
Line 52: | Line 52: | ||
^ ns/ | ^ ns/ | ||
- | | DPFP | 5.21| | + | | DPFP | 5.21| 18.35| |
- | | SXFP | 11.82| | + | | SXFP | 11.82| 37.44| |
- | | SFFP | 11.91| | + | | SFFP | 11.91| 40.98| |
+ | Like last testing outcome, in the SFFP precision mode it is best to run four individual jobs, one per GPU (mpi=1, gpu=1). Best performance is the P100 at 47.64 vs the RTX at 39.69 ns/day per node. The T4 runs about 1/3 as fast and really falters in DPFP precision mode. But in SXFP (experimental) precision mode the T4 makes up in performance. | ||
+ | Can't complain about utilization rates.\\ | ||
+ | Amber mpi=4 gpu=4\\ | ||
+ | |||
+ | [heme@login1 amber16]$ ssh node7 ./ | ||
+ | id, | ||
+ | 0, Tesla P100-PCIE-16GB, | ||
+ | 1, Tesla P100-PCIE-16GB, | ||
+ | 2, Tesla P100-PCIE-16GB, | ||
+ | 3, Tesla P100-PCIE-16GB, | ||
+ | |||
+ | ==== Lammps ==== | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ^ ns/ | ||
+ | | DPFP | | | | | | | ||
+ | | SXFP | | | | | | | ||
+ | | SFFP | | | | | | | ||
==== Scripts ==== | ==== Scripts ==== | ||
+ | |||
+ | All 3 software applications were compiled within default environment and Cuda 10.1 | ||
+ | |||
+ | Currently Loaded Modules:\\ | ||
+ | 1) GCCcore/ | ||
+ | 2) zlib/ | ||
+ | 3) binutils/ | ||
+ | |||
+ | Follow\\ | ||
+ | https:// | ||
* Amber | * Amber |