This is an old revision of the document!
We are interested in benchmarking the serial, MPI, cuda and cuda.MPI versions of pmemd.
PMEMD implementation of SANDER, Release 12 |
---|
Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient |
CPU Jobs | Serial | -np 2 | -np 4 | -np 8 | -np 16 | -np 24 | -np 32 |
---|---|---|---|---|---|---|---|
Wall Time (secs) | 211 | 120 | 64 | 35 | 29 | 26 | 33 |
GPU Jobs | Serial | -np 2 | -np 4 | -np 8 | -np 16 | -np 24 | -np 32 |
---|---|---|---|---|---|---|---|
Wall Time (secs) | 12 |
First we get some CPU based data.
# serial run of pmemd nohup $AMBERHOME/bin/pmemd -O -i mdin -o mdout -p prmtop \ -c inpcrd -r restrt -x mdcrd </dev/null & # parallel run, note that you will need create the machinefile # if -np=4 it would would contain 4 lines with the string 'localhost'...does not work, use hostname mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.MPI \ -O -i mdin -o mdout -p prmtop \ -c inpcrd -r restrt -x mdcrd </dev/null &
The following script should be in your path … located in ~/bin
You need to allocate one or more GPUs for your cuda runs.
node2$ gpu-info ==================================================== Device Model Temperature Utilization ==================================================== 0 Tesla K20 27 C 0 % 1 Tesla K20 28 C 0 % 2 Tesla K20 27 C 0 % 3 Tesla K20 30 C 0 % ====================================================
Next we need to expose these GPUs to pmemd …
# expose one export CUDA_VISIBLE_DEVICES="0" # serial run of pmemd.cuda nohup $AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout -p prmtop \ -c inpcrd -r restrt -x mdcrd </dev/null & # parallel run, note that you will need create the machinefile # if -np=4 it would could contain 4 lines with the string 'localhost' mpirun --machinefile=nodefile -np 4 $AMBERHOME/bin/pmemd.cuda.MPI \ -O -i mdin -o mdout -p prmtop \ -c inpcrd -r restrt -x mdcrd </dev/null &
You may want to try to run your pmemd problem across multiple GPUs if problem set is large enough.
# expose multiple (for serial or parallel runs) export CUDA_VISIBLE_DEVICES="0,2"
echo $i mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \ -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1 cp mdout ${i}gpu.parallel.log done [TestDriveUser0@K20-WS ec]$ [TestDriveUser0@K20-WS ec]$ w 06:50:20 up 2 days, 12:51, 3 users, load average: 1.73, 0.96, 1.31 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT TestDriv pts/0 engineering-pc.e Thu09 20:01m 0.08s 0.52s sshd: TestDriveUser [priv] TestDriv pts/1 greentail.wesley 05:31 4:50 2:38 0.00s /bin/bash ./run TestDriv pts/2 hmeij.its.wesley 05:31 0.00s 0.19s 0.01s w [TestDriveUser0@K20-WS ec]$ uptime 06:50:43 up 2 days, 12:51, 3 users, load average: 1.82, 1.05, 1.33 [TestDriveUser0@K20-WS ec]$ [TestDriveUser0@K20-WS ec]$ [TestDriveUser0@K20-WS ec]$ [TestDriveUser0@K20-WS ec]$ [TestDriveUser0@K20-WS ec]$ cat run #!/bin/bash rm -rf err out logfile mdout restrt mdinfo echo CPU serial pmemd -O -i inp/mini.in -p 1g6r.cd.parm \ -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1 cp mdout 1core.serial.log echo CPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun for i in 2 4 8 16 24 32 do echo $i mpirun --machinefile=nodefile$i -np $i pmemd.MPI -O -i inp/mini.in -p 1g6r.cd.parm \ -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1 cp mdout ${i}core.parallel.log done echo GPU serial export CUDA_VISIBLE_DEVICES="2" pmemd.cuda -O -i inp/mini.in -p 1g6r.cd.parm \ -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1 cp mdout 1gpu.serial.log echo GPU parallel 2,4,8,16 /usr/local/mpich2-1.4.1p1/bin/mpirun export CUDA_VISIBLE_DEVICES="2" for i in 2 do echo $i mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \ -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&1 cp mdout ${i}gpu.parallel.log done