This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
cluster:109 [2013/01/17 16:15] hmeij created |
cluster:109 [2013/02/02 20:02] hmeij [Lammps GPU Testing (EC)] |
||
---|---|---|---|
Line 2: | Line 2: | ||
**[[cluster: | **[[cluster: | ||
- | ===== GPU Testing | + | ===== Lammps |
- | Vendor: "There are currently two systems available, each | + | * 32 cores E2660 |
- | with two 8-core Xeon E5-2670 processors, 32GB memory, 120GB SSD and two | + | * 4 K20 GPU |
- | Tesla K20 GPUs. The hostnames are master and node2. | + | * workstation |
- | You will see that a GPU-accelerated version of LAMMPS with MPI support is | + | * MPICH2 flavor |
- | installed in / | + | |
+ | |||
+ | Same tests (12 cpu cores) using lj/cut, eam, lj/expand, and morse: **AU.reduced** | ||
+ | |||
+ | CPU only 6 mins 1 secs | ||
+ | 1 GPU 1 mins 1 secs (a 5-6 times speed up) | ||
+ | 2 GPUs 1 mins 0 secs (never saw 2nd GPU used, problem set too small?) | ||
+ | |||
+ | Same tests (12 cpu cores) using a restart file and using gayberne: **GB** | ||
+ | |||
+ | CPU only 1 hour 5 mins | ||
+ | 1 GPU 5 mins and 15 secs (a 18-19 times peed up) | ||
+ | 2 GPUs 2 mins (see below) | ||
+ | |||
+ | Francis' | ||
+ | |||
+ | ^3d Lennard-Jones melt: for 10,000 steps with 32,000 atoms^ | ||
+ | |CPU only| -np 1 | -np 6 | -np 12 | -np 24 | -np 36 | | ||
+ | |time| | ||
+ | |GPU only| 1xK20 | 2xK20 | 3xK20 | 4xK20 | -np 1-4 | | ||
+ | |time| | ||
+ | ^3d Lennard-Jones melt: for 10,000 steps with 32,000 atoms^ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== Lammps GPU Testing (MW) ===== | ||
+ | |||
+ | Vendor: "There are currently two systems available, each with two 8-core Xeon E5-2670 processors, 32GB memory, 120GB SSD and two Tesla K20 GPUs. The hostnames are master and node2. | ||
+ | You will see that a GPU-accelerated version of LAMMPS with MPI support is installed in / | ||
Actually, turns out there are 32 cores on node so I suspect four CPUs. | Actually, turns out there are 32 cores on node so I suspect four CPUs. | ||
+ | First, we expose the GPUs to Lammps (so running with a value of -1 ignores the GPUs) in our input file. | ||
+ | |||
+ | < | ||
+ | # Enable GPU's if variable is set. | ||
+ | if " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | </ | ||
+ | |||
+ | Then we invoke the Lammps executable with MPI. | ||
+ | |||
+ | < | ||
+ | NODES=1 | ||
+ | GPUIDX=0 | ||
+ | # set GPUIDX=0 for 1 GPU/node or GPUIDX=1 for 2 GPU/node | ||
+ | CORES=12 | ||
+ | |||
+ | which mpirun | ||
+ | |||
+ | echo "*** GPU run with one MPI process per core ***" | ||
+ | date | ||
+ | mpirun -np $((NODES*CORES)) -bycore ./lmp_ex1 -c off -var GPUIDX $GPUIDX \ | ||
+ | -in film.inp -l film_1_gpu_1_node.log | ||
+ | date | ||
+ | </ | ||
+ | |||
+ | Some tests using **lj/cut**, **eam**, **lj/ | ||
+ | |||
+ | * CPU only 4 mins 30 secs | ||
+ | * 1 GPU 0 mins 47 secs (a 5-6 times speed up) | ||
+ | * 2 GPUs 0 mins 46 secs (never saw 2nd GPU used, problem set too small?) | ||
+ | |||
+ | Some tests using a restart file and using **gayberne**, | ||
+ | |||
+ | * CPU only 1 hour 5 mins | ||
+ | * 1 GPU 3 mins and 33 secs (a 18-19 times peed up) | ||
+ | * 2 GPUs 2 mins (see below) | ||
+ | < | ||
+ | node2$ gpu-info | ||
+ | ==================================================== | ||
+ | Device | ||
+ | ==================================================== | ||
+ | 0 Tesla K20m 36 C 96 % | ||
+ | 1 Tesla K20m 34 C 92 % | ||
+ | ==================================================== | ||
+ | </ | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |