Differences

This shows you the differences between two versions of the page.

--- cluster:91 [2010/12/08 14:00]
hmeij created
+++ cluster:91 [2010/12/28 14:42]
hmeij
@@ Line 6: / Line 6: @@
 Grabbed the Linpack source and compiled against /opt/openmpi/1.4.2 ... using the Make.Linux_PII_CBLAS makefile.  Had to grab the ''atlas'' libraries from another host.  We changed $HOME and pointed to libmpi.so ($MPdir and $MPlib) and repointed $LAdir.  Then it compiled fine.
-===== Runs =====
+More about [[http://en.wikipedia.org/wiki/LINPACK|Linpack on wikipedia]]
-So based on what we did with the Dell burn in, follow this [[cluster:26|HPLinpack Runs]] link
+===== HP =====
+So based on what we did with the Dell burn in, follow this [[cluster:26|previous Linpack Runs]] link, some calculations:
+  * N calculation: 32 nodes, 12 gb each is 384 gb total which yields 48 gb double precision (8 byte) elements … 48 gb is 48*1024*1024*1024 = 51,539,607,552 … take the square root of that and round 227,032 … 80% of that is 181,600
+  * NB: start with 64, then 128, try 192 ...
+  * PxQ: perfect square of 16x16=256, the number of cores we have.
+Next create the machines files that list the hostname for each core.
+<code>
+for i in `seq 1 32`
+do
+for j in `seq 1 8`
+do
+echo n${i}-ib0 >> machines
+done
+done
+</code>
+Note that we're running via the hostname-ib0 port, that is the infiniband port.  Probably does not matter but that way we'll stay off the provisioning switch and should see the Voltaire switch light up.
+Simple script for invocation.
+<code>
+#!/bin/bash
+export PATH=/opt/openmpi/1.4.2/bin:$PATH
+export LD_LIBRARY_PATH=/opt/openmpi/1.42./lib:/home/hptest/test/lib64/atlas_GenuineIntel_x86_64
+mpirun -n 256 --hostfile machines ./xhpl > hpl.log 2>&1 &
+</code>
+===== Results =====
+And about the best results (1.5 teraflops) we found, was with
+  * N = 191,600
+  * NB of 128
+  * PxQ = 16 x 16
+<code>
+T/V                N    NB     P     Q               Time             Gflops
+----------------------------------------------------------------------------
+WR00R2C4      181600   128    16    16            2642.49          1.511e+03
+----------------------------------------------------------------------------
+||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0241238 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0097160 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0016592 ...... PASSED
+============================================================================
+T/V                N    NB     P     Q               Time             Gflops
+----------------------------------------------------------------------------
+WR00R2R2      181600   128    16    16            2649.93          1.507e+03
+----------------------------------------------------------------------------
+||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0246131 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0099131 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0016929 ...... PASSED
+============================================================================
+T/V                N    NB     P     Q               Time             Gflops
+----------------------------------------------------------------------------
+WR00R2R4      181600   128    16    16            2644.63          1.510e+03
+----------------------------------------------------------------------------
+||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0231181 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0093110 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0015901 ...... PASSED
+============================================================================
+</code>
+===== Image =====
+Looks like so.
+{{:cluster:linpack_greentail.png|}}
 \\
 **[[cluster:0|Back]]**
+===== Hmm =====
+And that revealed a host with 10 gb memory instead of 12gb.
+<code>
+[root@greentail Linux_PII_CBLAS]# pdsh grep MemTotal /proc/meminfo
+n10: MemTotal:     12290464 kB
+n26: MemTotal:     12290464 kB
+n13: MemTotal:     12290464 kB
+n3: MemTotal:     12290464 kB
+n2: MemTotal:     12290464 kB
+n9: MemTotal:     12290464 kB
+n23: MemTotal:     12290464 kB
+n30: MemTotal:     12290464 kB
+n28: MemTotal:     12290464 kB
+n1: MemTotal:     12290464 kB
+n31: MemTotal:     12290464 kB
+n20: MemTotal:     12290464 kB
+n27: MemTotal:     12290464 kB
+n25: MemTotal:     12290464 kB
+n15: MemTotal:     12290464 kB
+n16: MemTotal:     12290464 kB
+n18: MemTotal:     12290464 kB
+n29: MemTotal:     12290464 kB
+n6: MemTotal:     12290464 kB
+n7: MemTotal:     12290464 kB
+n5: MemTotal:     12290464 kB
+n24: MemTotal:     12290464 kB
+n32: MemTotal:     12290464 kB
+n19: MemTotal:     12290464 kB
+n12: MemTotal:     12290464 kB
+n22: MemTotal:     12290464 kB
+n8: MemTotal:     12290464 kB
+n11: MemTotal:     12290464 kB
+n4: MemTotal:     12290464 kB
+n14: MemTotal:     12290464 kB
+n17: MemTotal:     12290464 kB
+n21: MemTotal:     10221992 kB  <--- hmm
+</code>
+===== Dell =====
+Since the cluster will be shut down December 28th we have an opportunity to run Linpack on the Dell cluster.
+  * ETHERNET
+  * N calculation: 20 nodes, 4/8/16 gb mix for a total of 192 gb which yields 24 gb double precision (8 byte) elements … 24 gb is 24*1024*1024*1024 = 25,769,803,776 … take the square root of that and round 160529 … 80% of that is 128,423
+  * NB: start with 64, then 128, try 192 ...
+  * PxQ: perfect square of 10x16=160, the number of cores we have
+<code>
+============================================================================
+T/V                N    NB     P     Q               Time             Gflops
+----------------------------------------------------------------------------
+WR00L2L2       40800   128    10    16             184.42          2.455e+02
+----------------------------------------------------------------------------
+||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0069974 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0105682 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0020883 ...... PASSED
+============================================================================
+</code>
+  * INFINIBAND
+  * N calculation: 16 nodes, 8 gb per node for a total of 128 gb which yields 16 gb double precision (8 byte) elements … 16 gb is 16*1024*1024*1024 = 17,179,869,184 … take the square root of that and round 131,072 … 80% of that is 104,850
+  * NB: start with 64, then 128, try 192 ...
+  * PxQ: perfect square of 10x16=160, the number of cores we have
+<code>
+============================================================================
+T/V                N    NB     P     Q               Time             Gflops
+----------------------------------------------------------------------------
+WR00L2L2       52425    64    11    11             294.28          3.264e+02
+----------------------------------------------------------------------------
+||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0059082 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0133907 ...... PASSED
+||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0024480 ...... PASSED
+============================================================================
+</code>
+So a total of 2.455e+02 + 3.264e+0 or about 572 Gflops, 0.5 teraflops.
+===== BSS =====
+Since the cluster will be shut down December 28th we have an opportunity to run Linpack on the sharptail cluster.
+  * N calculation: 46 nodes, 24 gb per node for a total of 1,104 gb which yields 138 gb double precision (8 byte) elements … 138 gb is 138*1024*1024*1024 = 148,176,371,712 … take the square root of that and round 384936 … 80% of that is 307,950
+  * NB: start with 64, then 128, try 192 ...
+  * PxQ: perfect square of 9x10=92, close to the number of cores we have.
+\\
+**[[cluster:0|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools