Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+\\
+**[[cluster:23|Back]]**
+The production copy of OpenMPI is in ''/share/apps/openmpi-1.2''.\\
+ --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/04/19 15:27//
+====== HPLinpack Runs ======
+The purpose here is to rerun the HPLinpack benchmarks Amol ran while configuring the cluster.
+^Before^
+|{{:cluster:hplburn_before.gif|Idle!}}|
+^During^
+|{{:cluster:hplburn_during.gif|Heat!}}|
+^Ooops^
+|{{:cluster:hplburn.gif|Burn!}}|
+FAQ [[http://www.netlib.org/benchmark/hpl/faqs.html|External Link]]
+====== Problem Sizes ======
+N calculation, for example: \\
+nodes, 4 gb each is 16 gb total which yields 2 gb double precision (8 byte) elements ... 2gb is 2*1024*1024*1024 = 2,147,395,600 ... take the square root of that and round 46,340 ... 80% of that is 37072\\
+N calculation 16 nodes (infiniband or ethernet):\\
+nodes, 4 gb each is 64 gb total which yields 8 gb double precision (8 byte) elements ... 8gb is 8*1024*1024*1024 = 8,589,934,592 ... take the square root of that and round 92,681 ... 80% of that is 74145
+N calculation 4 heavy weight nodes:\\
+nodes, 16 gb each is 64 gb total which yields 8 gb double precision (8 byte) elements ... 8gb is 8*1024*1024*1024 = 8,589,934,592 ... take the square root of that and round 92,681 ... 80% of that is 74145
+NB calculations: \\
+range of 32...256\\
+ood starting values are 88 132
+PxQ Grid:\\
+max value PxQ should equal nr of cores \\
+P<Q ... close for infiniband but P much smaller than Q for ethernet
+LWNi (np=128): P=8, Q=16\\
+LWNe (np=128): P=4, Q=32 or P=2, Q=64\\
+HWN (np=32): P=4, Q=8 or P=2, Q=16
+===== Infiniband (16 nodes) =====
+  * nodes: compute-1-1 thru compute-1-16
+  * each dual quad 2.6 ghz PE1950 (2x4x16 totals 128 cores)
+  * each with 4 gb ram (4x16=64 gb total memory)
+===== HPL.dat ======
+<code>
+HPLinpack benchmark input file
+Innovative Computing Laboratory, University of Tennessee
+HPL.out      output file name (if any)
+            device out (6=stdout,7=stderr,file)
+            # of problems sizes (N)
+5000 10000 15000 20000 25000 30000 35000 Ns
+            # of NBs
+300 400 500 600 700     NBs
+            PMAP process mapping (0=Row-,1=Column-major)
+            # of process grids (P x Q)
+            Ps
+           Qs
+.0         threshold
+            # of panel fact
+1 2        PFACTs (0=left, 1=Crout, 2=Right)
+            # of recursive stopping criterium
+4          NBMINs (>= 1)
+            # of panels in recursion
+            NDIVs
+            # of recursive panel fact.
+1 2        RFACTs (0=left, 1=Crout, 2=Right)
+            # of broadcast
+            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
+            # of lookahead depth
+            DEPTHs (>=0)
+            SWAP (0=bin-exch,1=long,2=mix)
+           swapping threshold
+            L1 in (0=transposed,1=no-transposed) form
+            U  in (0=transposed,1=no-transposed) form
+            Equilibration (0=no,1=yes)
+            memory alignment in double (> 0)
+</code>
+===== run script ======
+**=> su delltest**
+<code>
+#!/bin/bash
+echo "setting P4_GLOBMEMSIZE=10000000"
+P4_GLOBMEMSIZE=10000000
+export P4_GLOBMEMSIZE
+echo "invoking..."
+echo "/usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 128 -hostfile ./machines ./xhpl"
+date > HPL.start
+(/usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 128 -hostfile ./machines ./xhpl > HPL.out 2>&1)
+</code>
+=> with above HPL.dat file, this configuration runs for 8 hours ...
+===== Ethernet (16 nodes) =====
+  * nodes: compute-1-17 thru compute-2-32
+  * each dual quad 2.6 ghz PE1950 (2x4x16 totals 128 cores)
+  * each with 4 gb ram (4x16=64 gb total memory)
+===== HPL.dat ======
+<code>
+Innovative Computing Laboratory, University of Tennessee
+HPL.out      output file name (if any)
+            device out (6=stdout,7=stderr,file)
+             # of problems sizes (N)
+Ns
+             # of NBs
+NBs
+            PMAP process mapping (0=Row-,1=Column-major)
+            # of process grids (P x Q)
+            Ps
+           Qs
+.0         threshold
+            # of panel fact
+1 2        PFACTs (0=left, 1=Crout, 2=Right)
+            # of recursive stopping criterium
+4          NBMINs (>= 1)
+            # of panels in recursion
+            NDIVs
+            # of recursive panel fact.
+1 2        RFACTs (0=left, 1=Crout, 2=Right)
+            # of broadcast
+            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
+            # of lookahead depth
+            DEPTHs (>=0)
+            SWAP (0=bin-exch,1=long,2=mix)
+           swapping threshold
+            L1 in (0=transposed,1=no-transposed) form
+            U  in (0=transposed,1=no-transposed) form
+            Equilibration (0=no,1=yes)
+            memory alignment in double (> 0)
+</code>
+===== run script ======
+**=> su delltest2**
+<code>
+#!/bin/bash
+echo "setting P4_GLOBMEMSIZE=10000000"
+P4_GLOBMEMSIZE=10000000
+export P4_GLOBMEMSIZE
+date > HPL.start
+echo "invoking..."
+echo "/home/delltest2/openmpi-1.2/bin/mpirun -np 128 -machinefile
+/home/delltest2/machines /home/delltest2/xhpl"
+(/home/delltest2/openmpi-1.2/bin/mpirun -np 128 -machinefile
+/home/delltest2/machines home/delltest2/xhpl > /home/delltest2/HPL.out 2>&1)&
+</code>
+=> runs for 4 hours ... change these lines below and it'll run for 14 hours
+<code>
+             # of problems sizes (N)
+74145 Ns
+             # of NBs
+132 NBs
+</code>
+===== Ethernet (4 nodes) =====
+  * nodes: nfs-2-1 thru nfs-2-4
+  * each dual quad 2.6 ghz PE1950 (2x4x4 totals 32 cores)
+  * each with 16 gb ram (16x4=64 gb total memory)
+===== HPL.dat ======
+<code>
+Innovative Computing Laboratory, University of Tennessee
+HPL.out      output file name (if any)
+            device out (6=stdout,7=stderr,file)
+            # of problems sizes (N)
+Ns
+            # of NBs
+NBs
+            PMAP process mapping (0=Row-,1=Column-major)
+            # of process grids (P x Q)
+            Ps
+            Qs
+.0         threshold
+            # of panel fact
+1 2        PFACTs (0=left, 1=Crout, 2=Right)
+            # of recursive stopping criterium
+4          NBMINs (>= 1)
+            # of panels in recursion
+            NDIVs
+            # of recursive panel fact.
+1 2        RFACTs (0=left, 1=Crout, 2=Right)
+            # of broadcast
+            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
+            # of lookahead depth
+            DEPTHs (>=0)
+            SWAP (0=bin-exch,1=long,2=mix)
+           swapping threshold
+            L1 in (0=transposed,1=no-transposed) form
+            U  in (0=transposed,1=no-transposed) form
+            Equilibration (0=no,1=yes)
+            memory alignment in double (> 0)
+</code>
+===== run script ======
+**=> su delltest3**
+<code>
+#!/bin/bash
+echo "setting P4_GLOBMEMSIZE=10000000"
+P4_GLOBMEMSIZE=10000000
+export P4_GLOBMEMSIZE
+date > HPL.start
+echo "invoking..."
+echo "/home/delltest3/openmpi-1.2/bin/mpirun -np 32 -machinefile
+ /home/delltest3/machines /home/delltest3/xhpl"
+(/home/delltest3/openmpi-1.2/bin/mpirun -np 32 -machinefile /home/delltest3/machines
+ /home/delltest3/xhpl > /home/delltest3/HPL.out 2>&1)&
+</code>
+=> runs for 14 1/2 hours ... change these lines and it'll run for 2 days
+<code>
+             # of NBs
+132 NBs
+</code>
+===== MPIRUN-1.2 =====
+From Sili at Platform ...
+<code>
+Actrually, MPICH 1 always has problems in the shared memory control. It really takes time to debug on the buggy shared memory stuff. I would rather suggest using openmpi instead of MPICH 1 to launch Ethernet linpack testings as openmpi is a newer and better MPI implementation than MPICH 1 and it is MPI-2 compatible plus it supports for both ethernet and infiniband devices.
+The precedure I just tested is as follows
+. Compile Openmpi
+Here is the procedure I used to recompile openmpi :
+# ./configure --prefix=/home/shuang/openmpi-1.2 --disable-mpi-f90
+# make
+# make install
+To test the installation, create a host file. I generated a hostfile :
+# cat /etc/hosts | grep compute | awk '{print $3} ' > machines
+Then I recompiled the hello example (the hello_c.c file can be found at the examples directory on the untar'd source directory):
+# /home/shuang/openmpi-1.2/bin/mpicc -o hello ./hello_c.c
+And tested it :
+# /home/shuang/openmpi-1.2/bin/mpirun -np 4 -machinefile machines --prefix /home/shuang/openmpi-1.2 ./hello
+Please note that I used the complete path to the executables because by default, lam will be picked up. This is also why I used the --prefix option. You may want to use modules to load / unload these environment settings. Please let me know if you would like to have more information about this (open-source) software.
+. Compile Linpack with Openmpi
+# wget http://www.netlib.org/benchmark/hpl/hpl.tgz
+# tar zxf hpl.tgz
+# cd hpl
+# cp setup/Make.Linux_PII_CBLAS .
+edit Make.Linux_PII_CBLAS, change "MPdir" to "/home/shuang/openmpi-1.2", change "MPlib" to "$(MPdir)/lib/libmpi.so", change "LAdir" to "/usr/lib64", change "CC" to "/home/shuang/openmpi-1.2/bin/mpicc", and change "LINKER" to "/home/shuang/openmpi-1.2/bin/mpicc".
+Then you can make linpack by
+# make arch=Linux_PII_CBLAS
+To test it, edit the HPL.dat accordingly and run by:
+# /home/shuang/openmpi-1.2/bin/mpirun -np 8 -machinefile machines --prefix /home/shuang/openmpi-1.2 ./xhpl
+</code>
+===== MPIRUN-1.2 (fixes) =====
+My experience ...
+<code>
+source is in /mnt/src/hmeij-tmp/foodir/src
+su delltest3
+export LD_LIBRARY_PATH="/home/delletst3/openmpi-1.2/lib:$LD_LIBRARY_PATH"
+add this to ~/.bashrc
+cd /mnt/src/hmeij-tmp/foodir/src/openmpi-1.2
+./configure --prefix /home/delltest3/openmpi-1.2 --disable-mpi-f90
+make
+make install
+cd ~\\
+/home/delltest3/openmpi-1.2/bin/mpicc -o hello \
+/mnt/src/hmeij-tmp/foodir/src/openmpi-1.2/examples/hello_c.c
+the machines file setup does not like 'nfs-2-1:8',
+so instead addd 8 lines for each node like this 'nfs-2-1'
+ldd hello
+ldd openmpi-1.2/bin/mpirun
+test on a  single machine
+/home/delltest3/openmpi-1.2/bin/mpirun -np 8 -machinefile \
+/home/delltest3/machines  /home/delltest3/hello
+cd ~
+(for some reason you need to do this for compilation to be successful)
+ln -s /mnt/src/hmeij-tmp/foodir/src/hpl
+cd hpl
+cp ~/Make.Linux_PII_CBLAS .
+make arch=Linux_PII_CBLAS
+cp bin/Linux_PII_CBLAS/xhpl ~
+cp bin/Linux_PII_CBLAS/HPL.dat ~
+cd ~
+/home/delltest3/openmpi-1.2/bin/mpirun -np 8 -machinefile \
+/home/delltest3/machines /home/delltest3/xhpl > HPL.out
+</code>
+\\
+**[[cluster:23|Back]]**

DokuWiki

User Tools

Site Tools

Differences

Page Tools