Warning: Undefined array key 24 in /usr/share/dokuwiki/inc/html.php on line 1453

Differences

This shows you the differences between two versions of the page.

--- cluster:109 [2013/01/17 11:15]
hmeij created
+++ cluster:109 [2013/02/02 15:30]
hmeij [Lammps GPU Testing (EC)]
@@ Line 2: / Line 2: @@
 **[[cluster:0|Back]]**
-===== GPU Testing at MW =====
+===== Lammps GPU Testing (EC) =====
-Vendor: "There are currently two systems available, each
+  * 32 cores E2660
-with two 8-core Xeon E5-2670 processors, 32GB memory, 120GB SSD and two
+  * 4 K20 GPU
-Tesla K20 GPUs. The hostnames are master and node2.
+  * workstation
-You will see that a GPU-accelerated version of LAMMPS with MPI support is
+  * MPICH2 flavor
-installed in /usr/local/LAMMPS."
+Same tests (12 cpu cores) using lj/cut, eam, lj/expand, and morse: **AU.reduced**
+    CPU only 6 mins 1 secs
+GPU 1 mins 1 secs (a 5-6 times speed up)
+GPUs 1 mins 0 secs (never saw 2nd GPU used, problem set too small?)
+Same tests (12 cpu cores) using a restart file and using gayberne: **GB**
+    CPU only 1 hour 5 mins
+GPU 5 mins and 15 secs (a 18-19 times peed up)
+GPUs 2 mins
+Above results seems overall a bit slower that at other vendor, but same pattern.
+Francis's Melt problem set
+^3d Lennard-Jones melt: for 10,000 steps with 32,000 atoms^^^^^^
+|CPU only|  -np 1  |  -np 6 | -np 12  |  -np 24  |  -np 36  |
+|loop times|  329s  |  63s  |  39s  |    29s  |  45s  |
+|GPU only|  1xK20  |  2xK20 |  3xK20  |  4xK20  |  (-np 1-4)  |
+|loop times|  28s  |  16s |  11s  |  10s  |    |
+^3d Lennard-Jones melt: for 100,000 steps with 32,000 atoms^^^^^^
+|GPU only|  1xK20  |  2xK20 |  3xK20  |  4xK20  |  (-np 1-4)  |
+|loop times|  274s  |  162s |  120s  |  98s  |    |
+  * Serial's time of 329s is reduced to 29s for MPI, an 11x speed up
+  * GPU's serial time matches MPI -np 24 and can be further reduced to 10s, a 3x speed up
+===== Lammps GPU Testing (MW) =====
+Vendor: "There are currently two systems available, each with two 8-core Xeon E5-2670 processors, 32GB memory, 120GB SSD and two Tesla K20 GPUs. The hostnames are master and node2.
+You will see that a GPU-accelerated version of LAMMPS with MPI support is installed in /usr/local/LAMMPS."
 Actually, turns out there are 32 cores on node so I suspect four CPUs.
+First, we expose the GPUs to Lammps (so running with a value of -1 ignores the GPUs) in our input file.
+<code>
+# Enable GPU's if variable is set.
+if "(${GPUIDX} >= 0)" then &
+        "suffix gpu" &
+        "newton off" &
+        "package gpu force 0 ${GPUIDX} 1.0"
+</code>
+Then we invoke the Lammps executable with MPI.
+<code>
+NODES=1      # number of nodes [=>1]
+GPUIDX=0     # GPU indices range from [0,1], this is the upper bound.
+             # set GPUIDX=0 for 1 GPU/node or GPUIDX=1 for 2 GPU/node
+CORES=12     # Cores per node. (i.e. 2 CPUs with 6 cores ea =12 cores per node)
+which mpirun
+echo "*** GPU run with one MPI process per core ***"
+date
+mpirun -np $((NODES*CORES)) -bycore ./lmp_ex1 -c off -var GPUIDX $GPUIDX \
+       -in film.inp -l film_1_gpu_1_node.log
+date
+</code>
+Some tests using **lj/cut**, **eam**, **lj/expand**, and **morse**:
+  * CPU only 4 mins 30 secs
+  * 1 GPU 0 mins 47 secs (a 5-6 times speed up)
+  * 2 GPUs 0 mins 46 secs (never saw 2nd GPU used, problem set too small?)
+Some tests using a restart file and using **gayberne**,
+  * CPU only 1 hour 5 mins
+  * 1 GPU 3 mins and 33 secs (a 18-19 times peed up)
+  * 2 GPUs 2 mins (see below)
+<code>
+node2$ gpu-info
+====================================================
+Device  Model           Temperature     Utilization
+====================================================
+       Tesla K20m      36 C            96 %
+       Tesla K20m      34 C            92 %
+====================================================
+</code>
 \\
 **[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools