User Tools

Site Tools


cluster:111

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:111 [2013/02/02 10:16]
hmeij [Results]
cluster:111 [2013/02/04 14:28] (current)
hmeij [Results]
Line 16: Line 16:
  
 ^ PMEMD implementation of SANDER, Release 12  ^ ^ PMEMD implementation of SANDER, Release 12  ^
-|Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient|+|Minimzing the system with 25 kcal/mol restraints on protein, 500 steps of steepest descent and 500 of conjugated gradient - Surjit Dixit problem set|
  
-^CPU Jobs^ Serial ^ -np 2 ^ -np 4 ^ -np 8 ^ -np 16 ^ -np 24 ^ -np 32 ^+^CPU Jobs (1,000 steps)^ Serial ^ -np 2 ^ -np 4 ^ -np 8 ^ -np 16 ^ -np 24 ^ -np 32 ^
 |Wall Time (secs)|  211  |  120  |  64  |  35  |  29  |  26  |  33  | |Wall Time (secs)|  211  |  120  |  64  |  35  |  29  |  26  |  33  |
  
Line 26: Line 26:
 |Wall Time (secs)|  12  |    |    |    |    |    |    | |Wall Time (secs)|  12  |    |    |    |    |    |    |
  
-  * GPU serial speedup is 17.CPU serial performance and outperforms MPI by at least 2x+  * GPU serial speedup is 17.5x CPU serial performance and outperforms MPI by at least 2x
   * GPU parallel unable to measure   * GPU parallel unable to measure
  
 +^AMBER BENCHMARK EXAMPLES^^^^^^
 +|JAC_PRODUCTION_NVE - 23,558 atoms PME||||||
 +|  16 cpu cores  | 1xK20  |  2xK20    3xK20  | 4xK20  |  measure  |
 +|  12.87  |  80.50  |  88.76  |  103.09  |  122.45  |  ns/day  |
 +|  6713.99  |  1073.23  |  973.45  |  838.09  |  705.61  |  seconds/ns  |
 +|FACTOR_IX_PRODUCTION_NVE - 90,906 atoms PME||||||
 +|  16 cpu cores  | 1xK20  |  2xK20    3xK20  | 4xK20  |  measure  |
 +|  3.95  |  22.25  |  27.47  |  32.56    39.52    ns/day  |
 +|  21865.59  |  3883.38  |  3145.32  |  2653.65  |  2186.28  |  seconds/ns  |
 +|CELLULOSE_PRODUCTION_NVE - 408,609 atoms PME||||||
 +|  16 cpu cores  | 1xK20  |  2xK20    3xK20  | 4xK20  |  measure  |
 +|  0.91    5.40  |  6.44  |  7.51     8.85  |  ns/day  |
 +|  95235.87  |  15986.42  |  13406.15  |  11509.28  |  9768.23    seconds/ns  |
 +|NUCLEOSOME_PRODUCTION - 25,095 atoms GB||||||
 +|  16 cpu cores  | 1xK20  |  2xK20    3xK20  | 4xK20  |  measure  |
 +|  0.06    2.79  |  3.65  |  3.98    ???    ns/day  |
 +|  1478614.67  |  31007.58  |  23694.29  |  21724.33  |  ???    seconds/ns  |
 +
 +
 +  * 5-6x performance speed ups using one GPU versus 16 CPU cores
 +  * 9-10x perrformance speedups using four GPUs versus 16 CPU cores
  
  
Line 100: Line 121:
 <code> <code>
  
-echo $i +[TestDriveUser0@K20-WS]$ cat run
-mpirun --machinefile=nodefile$i -np $i pmemd.cuda.MPI -O -i inp/mini.in -p 1g6r.cd.parm \ +
- -c 1g6r.cd.randions.crd.1 -ref 1g6r.cd.randions.crd.1 2>&+
-cp mdout ${i}gpu.parallel.log  +
-done +
- +
-[TestDriveUser0@K20-WS ec]$  +
-[TestDriveUser0@K20-WS ec]$ w +
- 06:50:20 up 2 days, 12:51,  3 users,  load average: 1.73, 0.96, 1.31 +
-USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT +
-TestDriv pts/0    engineering-pc.e Thu09   20:01m  0.08s  0.52s sshd: TestDriveUser [priv] +
-TestDriv pts/1    greentail.wesley 05:31    4:50   2:38   0.00s /bin/bash ./run +
-TestDriv pts/2    hmeij.its.wesley 05:31    0.00s  0.19s  0.01s w +
-[TestDriveUser0@K20-WS ec]$ uptime +
- 06:50:43 up 2 days, 12:51,  3 users,  load average: 1.82, 1.05, 1.33 +
-[TestDriveUser0@K20-WS ec]$  +
-[TestDriveUser0@K20-WS ec]$  +
-[TestDriveUser0@K20-WS ec]$  +
-[TestDriveUser0@K20-WS ec]$  +
-[TestDriveUser0@K20-WS ec]$ cat run+
 #!/bin/bash #!/bin/bash
 rm -rf err out logfile mdout restrt mdinfo rm -rf err out logfile mdout restrt mdinfo
cluster/111.1359818185.txt.gz · Last modified: 2013/02/02 10:16 by hmeij