cluster:109
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cluster:109 [2013/01/17 19:26] – [Laamps GPU Testing at MW] hmeij | cluster:109 [2013/10/16 19:13] (current) – [Lammps GPU Testing (EC)] hmeij | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| **[[cluster: | **[[cluster: | ||
| - | ===== Laamps | + | ===== Lammps |
| + | |||
| + | * 32 cores E2660 | ||
| + | * 4 K20 GPU | ||
| + | * workstation | ||
| + | * MPICH2 flavor | ||
| + | |||
| + | |||
| + | Same tests (12 cpu cores) using lj/cut, eam, lj/expand, and morse: **AU.reduced** | ||
| + | |||
| + | CPU only 6 mins 1 secs | ||
| + | 1 GPU 1 mins 1 secs (a 5-6 times speed up) | ||
| + | 2 GPUs 1 mins 0 secs (never saw 2nd GPU used, problem set too small?) | ||
| + | |||
| + | Same tests (12 cpu cores) using a restart file and using gayberne: **GB** | ||
| + | |||
| + | CPU only 1 hour 5 mins | ||
| + | 1 GPU 5 mins and 15 secs (a 18-19 times peed up) | ||
| + | 2 GPUs 2 mins | ||
| + | |||
| + | Above results seems overall a bit slower that at other vendor, but same pattern. | ||
| + | |||
| + | Francis' | ||
| + | |||
| + | ^3d Lennard-Jones melt: for 10,000 steps with 32,000 atoms^^^^^^ | ||
| + | |CPU only| -np 1 | -np 6 | -np 12 | -np 24 | -np 36 | | ||
| + | |loop times| | ||
| + | |GPU only| 1xK20 | 2xK20 | 3xK20 | 4xK20 | (-np 1-4) | | ||
| + | |loop times| | ||
| + | ^3d Lennard-Jones melt: for 100,000 steps with 32,000 atoms^^^^^^ | ||
| + | |GPU only| 1xK20 | 2xK20 | 3xK20 | 4xK20 | (-np 1-4) | | ||
| + | |loop times| | ||
| + | |||
| + | * Serial' | ||
| + | * GPU's serial time matches MPI -np 24 and can be further reduced to 10s, a 3x speed up | ||
| + | |||
| + | ==== Redoing Above ==== | ||
| + | |||
| + | **10/ | ||
| + | |||
| + | Redoing the melt problem now on our own K20 hardware I get the following (observing with gpu-info that utilization runs about 20-25% on the GPU allocated) | ||
| + | |||
| + | Loop time of 345.936 on 1 procs for 100000 steps with 32000 atoms | ||
| + | |||
| + | < | ||
| + | |||
| + | # | ||
| + | # submit via 'bsub < run.gpu' | ||
| + | rm -f log.lammps melt.log | ||
| + | #BSUB -e err | ||
| + | #BSUB -o out | ||
| + | #BSUB -q mwgpu | ||
| + | #BSUB -J test | ||
| + | |||
| + | ## leave sufficient time between job submissions (30-60 secs) | ||
| + | ## the number of GPUs allocated matches -n value automatically | ||
| + | ## always reserve GPU (gpu=1), setting this to 0 is a cpu job only | ||
| + | ## reserve 6144 MB (5 GB + 20%) memory per GPU | ||
| + | ## run all processes (1< | ||
| + | |||
| + | #BSUB -n 1 | ||
| + | #BSUB -R " | ||
| + | |||
| + | # from greentail we need to recreate module env | ||
| + | export PATH=/ | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | export PATH=/ | ||
| + | export LD_LIBRARY_PATH=/ | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | |||
| + | # unique job scratch dirs | ||
| + | MYSANSCRATCH=/ | ||
| + | MYLOCALSCRATCH=/ | ||
| + | export MYSANSCRATCH MYLOCALSCRATCH | ||
| + | cd $MYSANSCRATCH | ||
| + | |||
| + | # LAMMPS | ||
| + | # GPUIDX=1 use allocated GPU(s), GPUIDX=0 cpu run only (view header au.inp) | ||
| + | export GPUIDX=1 | ||
| + | # stage the data | ||
| + | cp ~/ | ||
| + | # feed the wrapper | ||
| + | lava.mvapich2.wrapper lmp_nVidia \ | ||
| + | -c off -var GPUIDX $GPUIDX -in in.melt | ||
| + | # save results | ||
| + | cp log.lammps melt.log | ||
| + | |||
| + | |||
| + | </ | ||
| + | |||
| + | ===== Lammps GPU Testing (MW) ===== | ||
| Vendor: "There are currently two systems available, each with two 8-core Xeon E5-2670 processors, 32GB memory, 120GB SSD and two Tesla K20 GPUs. The hostnames are master and node2. | Vendor: "There are currently two systems available, each with two 8-core Xeon E5-2670 processors, 32GB memory, 120GB SSD and two Tesla K20 GPUs. The hostnames are master and node2. | ||
cluster/109.1358450814.txt.gz · Last modified: by hmeij
