User Tools

Site Tools


cluster:111

This is an old revision of the document!



Back

Amber GPU Testing (EC)

We are interested in benchmarking the serial, MPI, cuda and cuda.MPI versions of pmemd.

Results

  • Verified the MPI threads and GPU invocations
  • Verified the output data
  • pmemd.cuda.MPI errors
  • Script used is listed at end of this page
PMEMD implementation of SANDER, Release 12
Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient
CPU Jobs Serial -np 2 -np 4 -np 8 -np 16 -np 24 -np 32
Wall Time (secs) 211 120 64 35 29 26 33
  • MPI speedup near -np 24 is 8x
GPU Jobs Serial -np 2 -np 4 -np 8 -np 16 -np 24 -np 32
Wall Time (secs) 12
  • GPU serial speedup is 17.5 CPU serial performance and outperforms MPI by at least 2x
  • GPU parallel unable to measure

Setup

First we get some CPU based data.

# serial run of pmemd
nohup $AMBERHOME/bin/pmemd -O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

# parallel run, note that you will need create the machinefile
# if -np=4 it would would contain 4 lines with the string 'localhost'...does not work, use hostname
mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.MPI \
-O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

The following script should be in your path … located in ~/bin

You need to allocate one or more GPUs for your cuda runs.

node2$ gpu-info
====================================================
Device  Model           Temperature     Utilization
====================================================
0       Tesla K20       27 C             0 %
1       Tesla K20       28 C             0 %
2       Tesla K20       27 C             0 %
3       Tesla K20       30 C             0 %
====================================================

Next we need to expose these GPUs to pmemd …

# expose one
export CUDA_VISIBLE_DEVICES="0"

# serial run of pmemd.cuda
nohup $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

# parallel run, note that you will need create the machinefile
# if -np=4 it would could contain 4 lines with the string 'localhost'
mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.cuda.MPI \
-O -i mdin -o mdout -p prmtop \
-c inpcrd -r restrt -x mdcrd </dev/null &

You may want to try to run your pmemd problem across multiple GPUs if problem set is large enough.

# expose multiple (for serial or parallel runs)
export CUDA_VISIBLE_DEVICES="0,2"

Script

echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}gpu.parallel.log 
done

[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ w
 06:50:20 up 2 days, 12:51,  3 users,  load average: 1.73, 0.96, 1.31
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
TestDriv pts/0    engineering-pc.e Thu09   20:01m  0.08s  0.52s sshd: TestDriveUser [priv]
TestDriv pts/1    greentail.wesley 05:31    4:50   2:38   0.00s /bin/bash ./run
TestDriv pts/2    hmeij.its.wesley 05:31    0.00s  0.19s  0.01s w
[TestDriveUser0@K20-WS ec]$ uptime
 06:50:43 up 2 days, 12:51,  3 users,  load average: 1.82, 1.05, 1.33
[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ 
[TestDriveUser0@K20-WS ec]$ cat run
#!/bin/bash
rm -rf err out logfile mdout restrt mdinfo

echo CPU serial
pmemd -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout 1core.serial.log

echo CPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun
for i in 2 4 8 16 24 32
do
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}core.parallel.log 
done

echo GPU serial
export CUDA_VISIBLE_DEVICES="2"
pmemd.cuda -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout 1gpu.serial.log

echo GPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun
export CUDA_VISIBLE_DEVICES="2"
for i in 2
do
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
 -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}gpu.parallel.log 
done


Back

cluster/111.1359818155.txt.gz · Last modified: 2013/02/02 10:15 by hmeij