This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:182 [2019/08/12 14:37] hmeij07 [the DPP] |
cluster:182 [2019/08/12 16:26] hmeij07 [Lammps] |
||
---|---|---|---|
Line 42: | Line 42: | ||
</ | </ | ||
+ | ==== Amber ==== | ||
+ | |||
+ | The RTX compute node only had one GPU, the other nodes had 4 GPUs. In each run the mpi threads requested equaled the number of GPUs involved. Sample script bottom of page. | ||
+ | |||
+ | * [DPFP] - Double Precision Forces, 64-bit Fixed point Accumulation. | ||
+ | * [SPXP] - Single Precision Forces, Mixed Precision [interger] Accumulation. | ||
+ | * [SPFP] - Single Precision Forces, 64-bit Fixed Point Accumulation. (Default) | ||
+ | |||
+ | |||
+ | ^ ns/ | ||
+ | | DPFP | 5.21| 18.35| | ||
+ | | SXFP | 11.82| | ||
+ | | SFFP | 11.91| | ||
+ | |||
+ | Like last testing outcome, in the SFFP precision mode it is best to run four individual jobs, one per GPU (mpi=1, gpu=1). Best performance is the P100 at 47.64 vs the RTX at 39.69 ns/day per node. The T4 runs about 1/3 as fast and really falters in DPFP precision mode. But in SXFP (experimental) precision mode the T4 makes up in performance. | ||
+ | |||
+ | Can't complain about utilization rates.\\ | ||
+ | Amber mpi=4 gpu=4\\ | ||
+ | |||
+ | [heme@login1 amber16]$ ssh node7 ./ | ||
+ | id, | ||
+ | 0, Tesla P100-PCIE-16GB, | ||
+ | 1, Tesla P100-PCIE-16GB, | ||
+ | 2, Tesla P100-PCIE-16GB, | ||
+ | 3, Tesla P100-PCIE-16GB, | ||
+ | |||
+ | ==== Lammps ==== | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ^ ns/ | ||
+ | | DPFP | | | | | | | ||
+ | | SXFP | | | | | | | ||
+ | | SFFP | | | | | | | ||
+ | ==== Scripts ==== | ||
+ | |||
+ | * Amber | ||
+ | |||
+ | < | ||
+ | |||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --nodelist=node7 | ||
+ | #SBATCH --job-name=" | ||
+ | #SBATCH --ntasks-per-node=1 | ||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --exclusive | ||
+ | |||
+ | # NSTEP = 40000 | ||
+ | rm -f restrt.1K10 | ||
+ | mpirun --oversubscribe -x LD_LIBRARY_PATH -np 1 \ | ||
+ | -H localhost \ | ||
+ | ~/ | ||
+ | -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd | ||
+ | |||
+ | </ | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |