Differences

This shows you the differences between two versions of the page.

--- cluster:182 [2019/08/12 15:00]
hmeij07 [Amber]
+++ cluster:182 [2019/08/12 16:26]
hmeij07 [Scripts]
@@ Line 52: / Line 52: @@
 ^  ns/day  ^  P100[1]  ^  P100[4]  ^  RTX[1]  ^  T4[1]  ^  T4[4]  ^  Notes  ^
-|  DPFP  |  5.21|
+|  DPFP  |  5.21|  18.35|  0.75|  0.35|  1.29|
-|  SXFP  |  11.82|
+|  SXFP  |  11.82|  37.44|  17.05|  7.01|  18.91|
-|  SFFP  |  11.91|
+|  SFFP  |  11.91|  40.98|  9.92|  4.35|  16.22|
+Like last testing outcome, in the SFFP precision mode it is best to run four individual jobs, one per GPU (mpi=1, gpu=1). Best performance is the P100 at 47.64 vs the RTX at 39.69 ns/day per node. The T4 runs about 1/3 as fast and really falters in DPFP precision mode. But in SXFP (experimental) precision mode the T4 makes up in performance.
+Can't complain about utilization rates.\\
+Amber mpi=4 gpu=4\\
+[heme@login1 amber16]$ ssh node7 ./gpu-info\\
+id,name,temp.gpu,mem.used,mem.free,util.gpu,util.mem\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 87 %, 1 %\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 95 %, 0 %\\
+, Tesla P100-PCIE-16GB, 79, 1052 MiB, 15228 MiB, 87 %, 0 %\\
+, Tesla P100-PCIE-16GB, 78, 1052 MiB, 15228 MiB, 94 %, 0 %\\
+==== Lammps ====
+^  ns/day  ^  P100[1]  ^  P100[4]  ^  RTX[1]  ^  T4[1]  ^  T4[4]  ^  Notes  ^
+|  DPFP  |  |  |  |  |  |
+|  SXFP  |  |  |  |  |  |
+|  SFFP  |  |  |  |  |  |
 ==== Scripts ====
+All 3 software applications were compiled within default environment and Cuda 10.1
+Currently Loaded Modules:\\
+) GCCcore/8.2.0     4) GCC/8.2.0-2.31.1   7) XZ/5.2.4           10) hwloc/1.11.11   13) FFTW/3.3.8\\
+) zlib/1.2.11       5) CUDA/10.1.105      8) libxml2/2.9.8      11) OpenMPI/3.1.3   14) ScaLAPACK/2.0.2-OpenBLAS-0.3.5\\
+) binutils/2.31.1   6) numactl/2.0.12     9) libpciaccess/0.14  12) OpenBLAS/0.3.5  15) fosscuda/2019a\\
+Follow\\
+https://dokuwiki.wesleyan.edu/doku.php?id=cluster:161\\
   * Amber

DokuWiki

User Tools

Site Tools

Differences

Page Tools