Table of Contents


Back

Amber GPU Testing (EC)

We are interested in benchmarking the serial, MPI, cuda and cuda.MPI versions of pmemd.

Results

PMEMD implementation of SANDER, Release 12
Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient - Surjit Dixit problem set
CPU Jobs (1,000 steps) Serial -np 2 -np 4 -np 8 -np 16 -np 24 -np 32
Wall Time (secs) 211 120 64 35 29 26 33
GPU Jobs Serial -np 2 -np 4 -np 8 -np 16 -np 24 -np 32
Wall Time (secs) 12
AMBER BENCHMARK EXAMPLES
JAC_PRODUCTION_NVE - 23,558 atoms PME
16 cpu cores 1xK20 2xK20 3xK20 4xK20 measure
12.87 80.50 88.76 103.09 122.45 ns/day
6713.99 1073.23 973.45 838.09 705.61 seconds/ns
FACTOR_IX_PRODUCTION_NVE - 90,906 atoms PME
16 cpu cores 1xK20 2xK20 3xK20 4xK20 measure
3.95 22.25 27.47 32.56 39.52 ns/day
21865.59 3883.38 3145.32 2653.65 2186.28 seconds/ns
CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME
16 cpu cores 1xK20 2xK20 3xK20 4xK20 measure
0.91 5.40 6.44 7.51 8.85 ns/day
95235.87 15986.42 13406.15 11509.28 9768.23 seconds/ns
NUCLEOSOME_PRODUCTION - 25,095 atoms GB
16 cpu cores 1xK20 2xK20 3xK20 4xK20 measure
0.06 2.79 3.65 3.98 ??? ns/day
1478614.67 31007.58 23694.29 21724.33 ??? seconds/ns

Setup

First we get some CPU based data.

# serial run of pmemd
nohup $AMBERHOME/bin/pmemd -O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

# parallel run, note that you will need create the machinefile
# if -np=4 it would would contain 4 lines with the string 'localhost'...does not work, use hostname
mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.MPI \
-O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

The following script should be in your path … located in ~/bin

You need to allocate one or more GPUs for your cuda runs.

node2$ gpu-info
====================================================
Device  Model           Temperature     Utilization
====================================================
0       Tesla K20       27 C             0 %
1       Tesla K20       28 C             0 %
2       Tesla K20       27 C             0 %
3       Tesla K20       30 C             0 %
====================================================

Next we need to expose these GPUs to pmemd …

# expose one
export CUDA_VISIBLE_DEVICES="0"

# serial run of pmemd.cuda
nohup $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

# parallel run, note that you will need create the machinefile
# if -np=4 it would could contain 4 lines with the string 'localhost'
mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.cuda.MPI \
-O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

You may want to try to run your pmemd problem across multiple GPUs if problem set is large enough.

# expose multiple (for serial or parallel runs)
export CUDA_VISIBLE_DEVICES="0,2"

Script

[TestDriveUser0@K20-WS]$ cat run
#!/bin/bash
rm -rf err out logfile mdout restrt mdinfo

echo CPU serial
pmemd -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout 1core.serial.log

echo CPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun
for i in 2 4 8 16 24 32
do
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}core.parallel.log 
done

echo GPU serial
export CUDA_VISIBLE_DEVICES="2"
pmemd.cuda -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout 1gpu.serial.log

echo GPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun
export CUDA_VISIBLE_DEVICES="2"
for i in 2
do
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}gpu.parallel.log 
done


Back