User Tools

Site Tools


cluster:109

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
cluster:109 [2013/01/17 16:15]
hmeij created
cluster:109 [2013/02/02 20:06]
hmeij [Lammps GPU Testing (EC)]
Line 2: Line 2:
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
  
-===== GPU Testing at MW =====+===== Lammps GPU Testing (EC) =====
  
-Vendor: "There are currently two systems available, each +  * 32 cores E2660 
-with two 8-core Xeon E5-2670 processors, 32GB memory, 120GB SSD and two +  * 4 K20 GPU 
-Tesla K20 GPUs. The hostnames are master and node2. +  * workstation 
-You will see that a GPU-accelerated version of LAMMPS with MPI support is +  * MPICH2 flavor 
-installed in /usr/local/LAMMPS."+ 
 + 
 +Same tests (12 cpu cores) using lj/cut, eam, lj/expand, and morse: **AU.reduced** 
 + 
 +    CPU only 6 mins 1 secs 
 +    1 GPU 1 mins 1 secs (a 5-6 times speed up) 
 +    2 GPUs 1 mins 0 secs (never saw 2nd GPU used, problem set too small?) 
 + 
 +Same tests (12 cpu cores) using a restart file and using gayberne: **GB** 
 + 
 +    CPU only 1 hour 5 mins 
 +    1 GPU 5 mins and 15 secs (a 18-19 times peed up) 
 +    2 GPUs 2 mins (see below) 
 + 
 +Francis's Melt problem set, uses lj 
 + 
 +^3d Lennard-Jones melt: for 10,000 steps with 32,000 atoms^ 
 +|CPU only|  -np 1  |  -np 6 | -np 12  |  -np 24  |  -np 36  | 
 +|loop times|  329s  |  63s  |  39s  |    29s  |  45s  | 
 +|GPU only|  1xK20  |  2xK20 |  3xK20  |  4xK20  |  -np 1-4  | 
 +|loop times|  28s  |  16s |  11s  |  10s  |    | 
 +^3d Lennard-Jones melt: for 10,000 steps with 32,000 atoms^ 
 + 
 + 
 + 
 + 
 + 
 +===== Lammps GPU Testing (MW) ===== 
 + 
 +Vendor: "There are currently two systems available, each with two 8-core Xeon E5-2670 processors, 32GB memory, 120GB SSD and two Tesla K20 GPUs. The hostnames are master and node2. 
 +You will see that a GPU-accelerated version of LAMMPS with MPI support is installed in /usr/local/LAMMPS."
  
 Actually, turns out there are 32 cores on node so I suspect four CPUs. Actually, turns out there are 32 cores on node so I suspect four CPUs.
  
 +First, we expose the GPUs to Lammps (so running with a value of -1 ignores the GPUs) in our input file.
 +
 +<code>
 +# Enable GPU's if variable is set.
 +if "(${GPUIDX} >= 0)" then &
 +        "suffix gpu" &
 +        "newton off" &
 +        "package gpu force 0 ${GPUIDX} 1.0"
 +</code>
 +
 +Then we invoke the Lammps executable with MPI.
 +
 +<code>
 +NODES=1      # number of nodes [=>1]
 +GPUIDX=0     # GPU indices range from [0,1], this is the upper bound.
 +             # set GPUIDX=0 for 1 GPU/node or GPUIDX=1 for 2 GPU/node
 +CORES=12     # Cores per node. (i.e. 2 CPUs with 6 cores ea =12 cores per node)
 +
 +which mpirun
 +
 +echo "*** GPU run with one MPI process per core ***"
 +date
 +mpirun -np $((NODES*CORES)) -bycore ./lmp_ex1 -c off -var GPUIDX $GPUIDX \
 +       -in film.inp -l film_1_gpu_1_node.log
 +date
 +</code>
 +
 +Some tests using **lj/cut**, **eam**, **lj/expand**, and **morse**:
 +
 +  * CPU only 4 mins 30 secs
 +  * 1 GPU 0 mins 47 secs (a 5-6 times speed up)
 +  * 2 GPUs 0 mins 46 secs (never saw 2nd GPU used, problem set too small?)
 +
 +Some tests using a restart file and using **gayberne**, 
 +
 +  * CPU only 1 hour 5 mins
 +  * 1 GPU 3 mins and 33 secs (a 18-19 times peed up)
 +  * 2 GPUs 2 mins (see below)
  
 +<code>
 +node2$ gpu-info
 +====================================================
 +Device  Model           Temperature     Utilization
 +====================================================
 +0       Tesla K20m      36 C            96 %
 +1       Tesla K20m      34 C            92 %
 +====================================================
 +</code>
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/109.txt · Last modified: 2013/10/16 19:13 by hmeij