This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:182 [2019/08/12 14:41] hmeij07 [Amber] |
cluster:182 [2019/08/12 16:45] hmeij07 [Lammps] |
||
---|---|---|---|
Line 44: | Line 44: | ||
==== Amber ==== | ==== Amber ==== | ||
- | The RTX compute node only had one GPU, the other nodes had 4 GPUs. | + | The RTX compute node only had one GPU, the other nodes had 4 GPUs. In each run the mpi threads requested equaled the number of GPUs involved. Sample script bottom of page. |
* [DPFP] - Double Precision Forces, 64-bit Fixed point Accumulation. | * [DPFP] - Double Precision Forces, 64-bit Fixed point Accumulation. | ||
Line 51: | Line 51: | ||
- | ^ | + | ^ |
+ | | DPFP | 5.21| 18.35| | ||
+ | | SXFP | 11.82| | ||
+ | | SFFP | 11.91| | ||
+ | Like last testing outcome, in the SFFP precision mode it is best to run four individual jobs, one per GPU (mpi=1, gpu=1). Best performance is the P100 at 47.64 vs the RTX at 39.69 ns/day per node. The T4 runs about 1/3 as fast and really falters in DPFP precision mode. But in SXFP (experimental) precision mode the T4 makes up in performance. | ||
+ | Can't complain about utilization rates.\\ | ||
+ | Amber mpi=4 gpu=4\\ | ||
+ | |||
+ | [heme@login1 amber16]$ ssh node7 ./ | ||
+ | id, | ||
+ | 0, Tesla P100-PCIE-16GB, | ||
+ | 1, Tesla P100-PCIE-16GB, | ||
+ | 2, Tesla P100-PCIE-16GB, | ||
+ | 3, Tesla P100-PCIE-16GB, | ||
+ | |||
+ | ==== Lammps ==== | ||
+ | |||
+ | Precision for GPU calculations | ||
+ | |||
+ | * DD -D_DOUBLE_DOUBLE | ||
+ | * SD -D_SINGLE_DOUBLE | ||
+ | * SS -D_SINGLE_SINGLE | ||
+ | |||
+ | |||
+ | |||
+ | ^ tau/ | ||
+ | | DD | 856669.660| | ||
+ | | SD | 981897.313| | ||
+ | | SS | 1050796.986| | ||
+ | ==== Scripts ==== | ||
+ | |||
+ | All 3 software applications were compiled within default environment and Cuda 10.1 | ||
+ | |||
+ | Currently Loaded Modules:\\ | ||
+ | 1) GCCcore/ | ||
+ | 2) zlib/ | ||
+ | 3) binutils/ | ||
+ | |||
+ | Follow\\ | ||
+ | https:// | ||
+ | |||
+ | * Amber | ||
+ | |||
+ | < | ||
+ | |||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --nodelist=node7 | ||
+ | #SBATCH --job-name=" | ||
+ | #SBATCH --ntasks-per-node=1 | ||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --exclusive | ||
+ | |||
+ | # NSTEP = 40000 | ||
+ | rm -f restrt.1K10 | ||
+ | mpirun --oversubscribe -x LD_LIBRARY_PATH -np 1 \ | ||
+ | -H localhost \ | ||
+ | ~/ | ||
+ | -inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd | ||
+ | |||
+ | </ | ||
+ | |||
+ | * Lammps | ||
+ | |||
+ | < | ||
+ | |||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --nodelist=node5 | ||
+ | #SBATCH --job-name=" | ||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --ntasks-per-node=1 | ||
+ | #SBATCH --exclusive | ||
+ | |||
+ | # RTX | ||
+ | mpirun --oversubscribe -x LD_LIBRARY_PATH -np 1 \ | ||
+ | -H localhost \ | ||
+ | ~/ | ||
+ | -in in.colloid > rtx-1:1 | ||
+ | |||
+ | [heme@login1 lammps-5Jun19]$ squeue | ||
+ | JOBID PARTITION | ||
+ | 2239 normal | ||
+ | |||
+ | [heme@login1 lammps-5Jun19]$ ssh node5 ./gpu-info | ||
+ | id, | ||
+ | 0, Quadro RTX 6000, 50, 186 MiB, 24004 MiB, 51 %, 0 % | ||
+ | |||
+ | </ | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |