This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:111 [2013/02/02 15:05] hmeij |
cluster:111 [2013/02/04 19:28] (current) hmeij [Results] |
||
---|---|---|---|
Line 16: | Line 16: | ||
^ PMEMD implementation of SANDER, Release 12 ^ | ^ PMEMD implementation of SANDER, Release 12 ^ | ||
- | |Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient| | + | |Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient |
- | |Job Type| Serial | + | |
- | |Wall Time (secs| | + | ^CPU Jobs (1,000 steps)^ Serial ^ -np 2 ^ -np 4 ^ -np 8 ^ -np 16 ^ -np 24 ^ -np 32 ^ |
+ | |Wall Time (secs)| | ||
+ | |||
+ | * MPI speedup near -np 24 is 8x serial | ||
+ | |||
+ | ^GPU Jobs^ Serial ^ -np 2 ^ -np 4 ^ -np 8 ^ -np 16 ^ -np 24 ^ -np 32 ^ | ||
+ | |Wall Time (secs)| | ||
+ | |||
+ | * GPU serial speedup is 17.5x CPU serial performance and outperforms MPI by at least 2x | ||
+ | * GPU parallel unable to measure | ||
+ | |||
+ | ^AMBER BENCHMARK EXAMPLES^^^^^^ | ||
+ | |JAC_PRODUCTION_NVE - 23,558 atoms PME|||||| | ||
+ | | 16 cpu cores | 1xK20 | 2xK20 | ||
+ | | 12.87 | 80.50 | 88.76 | 103.09 | ||
+ | | 6713.99 | ||
+ | |FACTOR_IX_PRODUCTION_NVE - 90,906 atoms PME|||||| | ||
+ | | 16 cpu cores | 1xK20 | 2xK20 | ||
+ | | 3.95 | 22.25 | 27.47 | 32.56 | ||
+ | | 21865.59 | ||
+ | |CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME|||||| | ||
+ | | 16 cpu cores | 1xK20 | 2xK20 | ||
+ | | 0.91 | ||
+ | | 95235.87 | ||
+ | |NUCLEOSOME_PRODUCTION - 25,095 atoms GB|||||| | ||
+ | | 16 cpu cores | 1xK20 | 2xK20 | ||
+ | | 0.06 | ||
+ | | 1478614.67 | ||
+ | |||
+ | |||
+ | * 5-6x performance speed ups using one GPU versus 16 CPU cores | ||
+ | * 9-10x perrformance speedups using four GPUs versus 16 CPU cores | ||
Line 90: | Line 121: | ||
< | < | ||
- | echo $i | + | [TestDriveUser0@K20-WS]$ cat run |
- | mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \ | + | |
- | -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>& | + | |
- | cp mdout ${i}gpu.parallel.log | + | |
- | done | + | |
- | + | ||
- | [TestDriveUser0@K20-WS ec]$ | + | |
- | [TestDriveUser0@K20-WS ec]$ w | + | |
- | | + | |
- | USER | + | |
- | TestDriv pts/0 engineering-pc.e Thu09 | + | |
- | TestDriv pts/1 greentail.wesley 05:31 4:50 | + | |
- | TestDriv pts/2 hmeij.its.wesley 05:31 0.00s 0.19s 0.01s w | + | |
- | [TestDriveUser0@K20-WS ec]$ uptime | + | |
- | | + | |
- | [TestDriveUser0@K20-WS ec]$ | + | |
- | [TestDriveUser0@K20-WS ec]$ | + | |
- | [TestDriveUser0@K20-WS ec]$ | + | |
- | [TestDriveUser0@K20-WS ec]$ | + | |
- | [TestDriveUser0@K20-WS | + | |
#!/bin/bash | #!/bin/bash | ||
rm -rf err out logfile mdout restrt mdinfo | rm -rf err out logfile mdout restrt mdinfo |