User Tools

Site Tools


cluster:109

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
cluster:109 [2013/02/02 15:30]
hmeij [Lammps GPU Testing (EC)]
cluster:109 [2013/10/16 15:13] (current)
hmeij [Lammps GPU Testing (EC)]
Line 38: Line 38:
   * GPU's serial time matches MPI -np 24 and can be further reduced to 10s, a 3x speed up   * GPU's serial time matches MPI -np 24 and can be further reduced to 10s, a 3x speed up
  
 +==== Redoing Above ====
 +
 +**10/​16/​2013**
 +
 +Redoing the melt problem now on our own K20 hardware I get the following (observing with gpu-info that utilization runs about 20-25% on the GPU allocated)
 +
 +Loop time of 345.936 on 1 procs for 100000 steps with 32000 atoms
 +
 +<​code>​
 +
 +#​!/​bin/​bash ​                                                                                    
 +# submit via 'bsub < run.gpu' ​                                                                  
 +rm -f log.lammps melt.log ​                                                                      
 +#BSUB -e err                                                                                    ​
 +#BSUB -o out                                                                                    ​
 +#BSUB -q mwgpu                                                                                  ​
 +#BSUB -J test                                                                                   
 +
 +## leave sufficient time between job submissions (30-60 secs)
 +## the number of GPUs allocated matches -n value automatically
 +## always reserve GPU (gpu=1), setting this to 0 is a cpu job only
 +## reserve 6144 MB (5 GB + 20%) memory per GPU
 +## run all processes (1<​=n<​=4)) on same node (hosts=1).
 +
 +#BSUB -n 1
 +#BSUB -R "​rusage[gpu=1:​mem=6144],​span[hosts=1]"​
 +
 +# from greentail we need to recreate module env
 +export PATH=/​home/​apps/​bin:/​cm/​local/​apps/​cuda50/​libs/​304.54/​bin:​\
 +/​cm/​shared/​apps/​cuda50/​sdk/​5.0.35/​bin/​linux/​release:/​cm/​shared/​apps/​lammps/​cuda/​2013-01-27/:​\
 +/​cm/​shared/​apps/​amber/​amber12/​bin:/​cm/​shared/​apps/​namd/​ibverbs-smp-cuda/​2013-06-02/:​\
 +/​usr/​lib64/​qt-3.3/​bin:/​usr/​local/​bin:/​bin:/​usr/​bin:/​usr/​local/​sbin:/​usr/​sbin:/​sbin:/​sbin:​\
 +/​usr/​sbin:/​cm/​shared/​apps/​cuda50/​toolkit/​5.0.35/​bin:​\
 +/​cm/​shared/​apps/​cuda50/​sdk/​5.0.35/​bin/​linux/​release:/​cm/​shared/​apps/​cuda50/​libs/​current/​bin:​\
 +/​cm/​shared/​apps/​cuda50/​toolkit/​5.0.35/​open64/​bin:/​cm/​shared/​apps/​mvapich2/​gcc/​64/​1.6/​bin:​\
 +/​cm/​shared/​apps/​mvapich2/​gcc/​64/​1.6/​sbin
 +export PATH=/​share/​apps/​bin:​$PATH
 +export LD_LIBRARY_PATH=/​cm/​local/​apps/​cuda50/​libs/​304.54/​lib64:​\
 +/​cm/​shared/​apps/​cuda50/​toolkit/​5.0.35/​lib64:/​cm/​shared/​apps/​amber/​amber12/​lib:​\
 +/​cm/​shared/​apps/​amber/​amber12/​lib64:/​cm/​shared/​apps/​namd/​ibverbs-smp-cuda/​2013-06-02/:​\
 +/​cm/​shared/​apps/​cuda50/​toolkit/​5.0.35/​lib64:/​cm/​shared/​apps/​cuda50/​libs/​current/​lib64:​\
 +/​cm/​shared/​apps/​cuda50/​toolkit/​5.0.35/​open64/​lib:​\
 +/​cm/​shared/​apps/​cuda50/​toolkit/​5.0.35/​extras/​CUPTI/​lib:​\
 +/​cm/​shared/​apps/​mvapich2/​gcc/​64/​1.6/​lib
 +
 +# unique job scratch dirs
 +MYSANSCRATCH=/​sanscratch/​$LSB_JOBID
 +MYLOCALSCRATCH=/​localscratch/​$LSB_JOBID
 +export MYSANSCRATCH MYLOCALSCRATCH
 +cd $MYSANSCRATCH
 +
 +# LAMMPS
 +# GPUIDX=1 use allocated GPU(s), GPUIDX=0 cpu run only (view header au.inp)
 +export GPUIDX=1
 +# stage the data
 +cp ~/​gpu_testing/​fstarr/​lj/​* ​ .
 +# feed the wrapper
 +lava.mvapich2.wrapper lmp_nVidia \
 +-c off -var GPUIDX $GPUIDX -in in.melt
 +# save results
 +cp log.lammps melt.log ​ ~/​gpu_testing/​fstarr/​lj/​
 +
 +
 +</​code>​
  
 ===== Lammps GPU Testing (MW) ===== ===== Lammps GPU Testing (MW) =====
cluster/109.txt ยท Last modified: 2013/10/16 15:13 by hmeij