cluster:111
This is an old revision of the document!
Table of Contents
Amber GPU Testing (EC)
We are interested in benchmarking the serial, MPI, cuda and cuda.MPI versions of pmemd.
Results
- Verified the MPI threads and GPU invocations
- Verified the output data
- pmemd.cuda.MPI errors
- Script used is listed at end of this page
| PMEMD implementation of SANDER, Release 12 |
|---|
| Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient |
| CPU Jobs | Serial | -np 2 | -np 4 | -np 8 | -np 16 | -np 24 | -np 32 |
|---|---|---|---|---|---|---|---|
| Wall Time (secs) | 211 | 120 | 64 | 35 | 29 | 26 | 33 |
- MPI speedup near -np 24 is 8x
| GPU Jobs | Serial | -np 2 | -np 4 | -np 8 | -np 16 | -np 24 | -np 32 |
|---|---|---|---|---|---|---|---|
| Wall Time (secs) | 12 |
- GPU serial speedup is 17.5 CPU serial performance and outperforms MPI by at least 2x
- GPU parallel unable to measure
Setup
First we get some CPU based data.
# serial run of pmemd nohup $AMBERHOME/bin/pmemd -O -i mdin -o mdout -p prmtop \ -c inpcrd -r restrt -x mdcrd </dev/null & # parallel run, note that you will need create the machinefile # if -np=4 it would would contain 4 lines with the string 'localhost'...does not work, use hostname mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.MPI \ -O -i mdin -o mdout -p prmtop \ -c inpcrd -r restrt -x mdcrd </dev/null &
The following script should be in your path … located in ~/bin
You need to allocate one or more GPUs for your cuda runs.
node2$ gpu-info ==================================================== Device Model Temperature Utilization ==================================================== 0 Tesla K20 27 C 0 % 1 Tesla K20 28 C 0 % 2 Tesla K20 27 C 0 % 3 Tesla K20 30 C 0 % ====================================================
Next we need to expose these GPUs to pmemd …
# expose one export CUDA_VISIBLE_DEVICES="0" # serial run of pmemd.cuda nohup $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout -p prmtop \ -c inpcrd -r restrt -x mdcrd </dev/null & # parallel run, note that you will need create the machinefile # if -np=4 it would could contain 4 lines with the string 'localhost' mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.cuda.MPI \ -O -i mdin -o mdout -p prmtop \ -c inpcrd -r restrt -x mdcrd </dev/null &
You may want to try to run your pmemd problem across multiple GPUs if problem set is large enough.
# expose multiple (for serial or parallel runs) export CUDA_VISIBLE_DEVICES="0,2"
Script
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
-c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}gpu.parallel.log
done
[TestDriveUser0@K20-WS ec]$
[TestDriveUser0@K20-WS ec]$ w
06:50:20 up 2 days, 12:51, 3 users, load average: 1.73, 0.96, 1.31
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
TestDriv pts/0 engineering-pc.e Thu09 20:01m 0.08s 0.52s sshd: TestDriveUser [priv]
TestDriv pts/1 greentail.wesley 05:31 4:50 2:38 0.00s /bin/bash ./run
TestDriv pts/2 hmeij.its.wesley 05:31 0.00s 0.19s 0.01s w
[TestDriveUser0@K20-WS ec]$ uptime
06:50:43 up 2 days, 12:51, 3 users, load average: 1.82, 1.05, 1.33
[TestDriveUser0@K20-WS ec]$
[TestDriveUser0@K20-WS ec]$
[TestDriveUser0@K20-WS ec]$
[TestDriveUser0@K20-WS ec]$
[TestDriveUser0@K20-WS ec]$ cat run
#!/bin/bash
rm -rf err out logfile mdout restrt mdinfo
echo CPU serial
pmemd -O -i inp/mini.in -p 1g6r.cd.parm \
-c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout 1core.serial.log
echo CPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun
for i in 2 4 8 16 24 32
do
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
-c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}core.parallel.log
done
echo GPU serial
export CUDA_VISIBLE_DEVICES="2"
pmemd.cuda -O -i inp/mini.in -p 1g6r.cd.parm \
-c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout 1gpu.serial.log
echo GPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun
export CUDA_VISIBLE_DEVICES="2"
for i in 2
do
echo $i
mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \
-c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1
cp mdout ${i}gpu.parallel.log
done
cluster/111.1359818155.txt.gz · Last modified: by hmeij
