DokuWiki

This is an old revision of the document!

Amber GPU Testing (EC)

We are interested in benchmarking the serial, MPI, cuda and cuda.MPI versions of pmemd.

Results

Verified the MPI threads and GPU invocations
Verified the output data
pmemd.cuda.MPI errors
Script used is listed at end of this page

PMEMD implementation of SANDER, Release 12
Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient

CPU Jobs	Serial	-np 2	-np 4	-np 8	-np 16	-np 24	-np 32
Wall Time (secs)	211	120	64	35	29	26	33

MPI speedup near -np 24 is 8x

GPU Jobs	Serial	-np 2	-np 4	-np 8	-np 16	-np 24	-np 32
Wall Time (secs)	12

GPU serial speedup is 17.5 CPU serial performance and outperforms MPI by at least 2x
GPU parallel unable to measure

Setup

First we get some CPU based data.

# serial run of pmemd
nohup $AMBERHOME/bin/pmemd -O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

# parallel run, note that you will need create the machinefile
# if -np=4 it would would contain 4 lines with the string 'localhost'...does not work, use hostname
mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.MPI \
-O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

The following script should be in your path … located in ~/bin

You need to allocate one or more GPUs for your cuda runs.

node2$ gpu-info
====================================================
Device  Model           Temperature     Utilization
====================================================
0       Tesla K20       27 C             0 %
1       Tesla K20       28 C             0 %
2       Tesla K20       27 C             0 %
3       Tesla K20       30 C             0 %
====================================================

Next we need to expose these GPUs to pmemd …

# expose one
export CUDA_VISIBLE_DEVICES="0"

# serial run of pmemd.cuda
nohup $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

# parallel run, note that you will need create the machinefile
# if -np=4 it would could contain 4 lines with the string 'localhost'
mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.cuda.MPI \
-O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

You may want to try to run your pmemd problem across multiple GPUs if problem set is large enough.

# expose multiple (for serial or parallel runs)
export CUDA_VISIBLE_DEVICES="0,2"

Script

echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}gpu.parallel.log 
done

[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ w
 06:50:20 up 2 days, 12:51,  3 users,  load average: 1.73, 0.96, 1.31
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
TestDriv pts/0    engineering-pc.e Thu09   20:01m  0.08s  0.52s sshd: TestDriveUser [priv]
TestDriv pts/1    greentail.wesley 05:31    4:50   2:38   0.00s /bin/bash ./run
TestDriv pts/2    hmeij.its.wesley 05:31    0.00s  0.19s  0.01s w
[TestDriveUser0@K20-WS ec]$ uptime
 06:50:43 up 2 days, 12:51,  3 users,  load average: 1.82, 1.05, 1.33
[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ cat run
#!/bin/bash
rm -rf err out logfile mdout restrt mdinfo

echo CPU serial
pmemd -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout 1core.serial.log

echo CPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun
for i in 2 4 8 16 24 32
do
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}core.parallel.log 
done

echo GPU serial
export CUDA_VISIBLE_DEVICES="2"
pmemd.cuda -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout 1gpu.serial.log

echo GPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun
export CUDA_VISIBLE_DEVICES="2"
for i in 2
do
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}gpu.parallel.log 
done

Back

Table of Contents

Amber GPU Testing (EC)

Results

Setup

Script