Differences

This shows you the differences between two versions of the page.

--- cluster:182 [2019/08/12 14:37]
hmeij07 [the DPP]
+++ cluster:182 [2019/08/12 16:26]
hmeij07 [Lammps]
@@ Line 42: / Line 42: @@
 </code>
+==== Amber ====
+The RTX compute node only had one GPU, the other nodes had 4 GPUs. In each run the mpi threads requested equaled the number of GPUs involved. Sample script bottom of page.
+  * [DPFP] - Double Precision Forces, 64-bit Fixed point Accumulation.
+  * [SPXP] - Single Precision Forces, Mixed Precision [interger] Accumulation.
+  * [SPFP] - Single Precision Forces, 64-bit Fixed Point Accumulation. (Default)
+^  ns/day  ^  P100[1]  ^  P100[4]  ^  RTX[1]  ^  T4[1]  ^  T4[4]  ^  Notes  ^
+|  DPFP  |  5.21|  18.35|  0.75|  0.35|  1.29|
+|  SXFP  |  11.82|  37.44|  17.05|  7.01|  18.91|
+|  SFFP  |  11.91|  40.98|  9.92|  4.35|  16.22|
+Like last testing outcome, in the SFFP precision mode it is best to run four individual jobs, one per GPU (mpi=1, gpu=1). Best performance is the P100 at 47.64 vs the RTX at 39.69 ns/day per node. The T4 runs about 1/3 as fast and really falters in DPFP precision mode. But in SXFP (experimental) precision mode the T4 makes up in performance.
+Can't complain about utilization rates.\\
+Amber mpi=4 gpu=4\\
+[heme@login1 amber16]$ ssh node7 ./gpu-info\\
+id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 87 %, 1 %\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 95 %, 0 %\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 87 %, 0 %\\
+, Tesla P100-PCIE-16GB, 78, 1052 MiB, 15228 MiB, 94 %, 0 %\\
+==== Lammps ====
+^  ns/day  ^  P100[1]  ^  P100[4]  ^  RTX[1]  ^  T4[1]  ^  T4[4]  ^  Notes  ^
+|  DPFP  |  |  |  |  |  |
+|  SXFP  |  |  |  |  |  |
+|  SFFP  |  |  |  |  |  |
+==== Scripts ====
+  * Amber
+<code>
+#!/bin/bash
+#SBATCH --nodes=1
+#SBATCH --nodelist=node7
+#SBATCH --job-name="P100 dd"
+#SBATCH --ntasks-per-node=1
+#SBATCH --gres=gpu:1
+#SBATCH --exclusive
+# NSTEP = 40000
+rm -f restrt.1K10
+mpirun --oversubscribe -x LD_LIBRARY_PATH -np 1 \
+-H localhost \
+~/amber16/bin/pmemd.cuda_DPFP.MPI -O -o p100-dd-1-1 \
+-inf mdinfo.1K10 -x mdcrd.1K10 -r restrt.1K10 -ref inpcrd
+</code>
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools