This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:109 [2013/02/02 20:22] hmeij [Lammps GPU Testing (EC)] |
cluster:109 [2013/10/16 19:13] (current) hmeij [Lammps GPU Testing (EC)] |
||
---|---|---|---|
Line 20: | Line 20: | ||
CPU only 1 hour 5 mins | CPU only 1 hour 5 mins | ||
1 GPU 5 mins and 15 secs (a 18-19 times peed up) | 1 GPU 5 mins and 15 secs (a 18-19 times peed up) | ||
- | 2 GPUs 2 mins (see below) | + | 2 GPUs 2 mins |
+ | |||
+ | Above results seems overall a bit slower that at other vendor, but same pattern. | ||
Francis' | Francis' | ||
Line 27: | Line 29: | ||
|CPU only| -np 1 | -np 6 | -np 12 | -np 24 | -np 36 | | |CPU only| -np 1 | -np 6 | -np 12 | -np 24 | -np 36 | | ||
|loop times| | |loop times| | ||
- | |GPU only| 1xK20 | 2xK20 | 3xK20 | 4xK20 | -np 1-4 | | + | |GPU only| 1xK20 | 2xK20 | 3xK20 | 4xK20 | |
|loop times| | |loop times| | ||
^3d Lennard-Jones melt: for 100,000 steps with 32,000 atoms^^^^^^ | ^3d Lennard-Jones melt: for 100,000 steps with 32,000 atoms^^^^^^ | ||
- | |GPU only| 1xK20 | 2xK20 | 3xK20 | 4xK20 | -np 1-4 | | + | |GPU only| 1xK20 | 2xK20 | 3xK20 | 4xK20 | |
|loop times| | |loop times| | ||
+ | * Serial' | ||
+ | * GPU's serial time matches MPI -np 24 and can be further reduced to 10s, a 3x speed up | ||
+ | ==== Redoing Above ==== | ||
+ | **10/ | ||
+ | |||
+ | Redoing the melt problem now on our own K20 hardware I get the following (observing with gpu-info that utilization runs about 20-25% on the GPU allocated) | ||
+ | |||
+ | Loop time of 345.936 on 1 procs for 100000 steps with 32000 atoms | ||
+ | |||
+ | < | ||
+ | |||
+ | # | ||
+ | # submit via 'bsub < run.gpu' | ||
+ | rm -f log.lammps melt.log | ||
+ | #BSUB -e err | ||
+ | #BSUB -o out | ||
+ | #BSUB -q mwgpu | ||
+ | #BSUB -J test | ||
+ | |||
+ | ## leave sufficient time between job submissions (30-60 secs) | ||
+ | ## the number of GPUs allocated matches -n value automatically | ||
+ | ## always reserve GPU (gpu=1), setting this to 0 is a cpu job only | ||
+ | ## reserve 6144 MB (5 GB + 20%) memory per GPU | ||
+ | ## run all processes (1< | ||
+ | |||
+ | #BSUB -n 1 | ||
+ | #BSUB -R " | ||
+ | |||
+ | # from greentail we need to recreate module env | ||
+ | export PATH=/ | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | export PATH=/ | ||
+ | export LD_LIBRARY_PATH=/ | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | |||
+ | # unique job scratch dirs | ||
+ | MYSANSCRATCH=/ | ||
+ | MYLOCALSCRATCH=/ | ||
+ | export MYSANSCRATCH MYLOCALSCRATCH | ||
+ | cd $MYSANSCRATCH | ||
+ | |||
+ | # LAMMPS | ||
+ | # GPUIDX=1 use allocated GPU(s), GPUIDX=0 cpu run only (view header au.inp) | ||
+ | export GPUIDX=1 | ||
+ | # stage the data | ||
+ | cp ~/ | ||
+ | # feed the wrapper | ||
+ | lava.mvapich2.wrapper lmp_nVidia \ | ||
+ | -c off -var GPUIDX $GPUIDX -in in.melt | ||
+ | # save results | ||
+ | cp log.lammps melt.log | ||
+ | |||
+ | |||
+ | </ | ||
===== Lammps GPU Testing (MW) ===== | ===== Lammps GPU Testing (MW) ===== |