Differences

This shows you the differences between two versions of the page.

--- cluster:182 [2019/08/12 15:00]
hmeij07 [Amber]
+++ cluster:182 [2019/08/12 16:28]
hmeij07 [Scripts]
@@ Line 52: / Line 52: @@
 ^  ns/day  ^  P100[1]  ^  P100[4]  ^  RTX[1]  ^  T4[1]  ^  T4[4]  ^  Notes  ^
-|  DPFP  |  5.21|
+|  DPFP  |  5.21|  18.35|  0.75|  0.35|  1.29|
-|  SXFP  |  11.82|
+|  SXFP  |  11.82|  37.44|  17.05|  7.01|  18.91|
-|  SFFP  |  11.91|
+|  SFFP  |  11.91|  40.98|  9.92|  4.35|  16.22|
+Like last testing outcome, in the SFFP precision mode it is best to run four individual jobs, one per GPU (mpi=1, gpu=1). Best performance is the P100 at 47.64 vs the RTX at 39.69 ns/day per node. The T4 runs about 1/3 as fast and really falters in DPFP precision mode. But in SXFP (experimental) precision mode the T4 makes up in performance.
+Can't complain about utilization rates.\\
+Amber mpi=4 gpu=4\\
+[heme@login1 amber16]$ ssh node7 ./gpu-info\\
+id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 87 %, 1 %\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 95 %, 0 %\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 87 %, 0 %\\
+, Tesla P100-PCIE-16GB, 78, 1052 MiB, 15228 MiB, 94 %, 0 %\\
+==== Lammps ====
+^  ns/day  ^  P100[1]  ^  P100[4]  ^  RTX[1]  ^  T4[1]  ^  T4[4]  ^  Notes  ^
+|  DPFP  |  |  |  |  |  |
+|  SXFP  |  |  |  |  |  |
+|  SFFP  |  |  |  |  |  |
 ==== Scripts ====
+All 3 software applications were compiled within default environment and Cuda 10.1
+Currently Loaded Modules:\\
+) GCCcore/8.2.0     4) GCC/8.2.0-2.31.1   7) XZ/5.2.4           10) hwloc/1.11.11   13) FFTW/3.3.8\\
+) zlib/1.2.11       5) CUDA/10.1.105      8) libxml2/2.9.8      11) OpenMPI/3.1.3   14) ScaLAPACK/2.0.2-OpenBLAS-0.3.5\\
+) binutils/2.31.1   6) numactl/2.0.12     9) libpciaccess/0.14  12) OpenBLAS/0.3.5  15) fosscuda/2019a\\
+Follow\\
+https://dokuwiki.wesleyan.edu/doku.php?id=cluster:161\\
   * Amber
@@ Line 81: / Line 111: @@
 </code>
+  * Lammps
+<code>
+#!/bin/bash
+#SBATCH --nodes=1
+#SBATCH --nodelist=node5
+#SBATCH --job-name="RTX dd"
+#SBATCH --gres=gpu:1
+#SBATCH --ntasks-per-node=1
+#SBATCH --exclusive
+# RTX
+mpirun --oversubscribe -x LD_LIBRARY_PATH -np 1 \
+-H localhost \
+~/lammps-5Jun19/lmp_mpi_double_double -suffix gpu -pk gpu 1 \
+-in in.colloid > rtx-1:1
+[heme@login1 lammps-5Jun19]$ squeue
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+    normal   RTX dd     heme  R       3:17      1 node5
+[heme@login1 lammps-5Jun19]$ ssh node5 ./gpu-info
+id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem
+, Quadro RTX 6000, 50, 186 MiB, 24004 MiB, 51 %, 0 %
+</code>
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools