User Tools

Site Tools


cluster:26

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cluster:26 [2007/04/19 15:28] (current)
Line 1: Line 1:
 +\\
 +**[[cluster:​23|Back]]**
  
 +The production copy of OpenMPI is in ''/​share/​apps/​openmpi-1.2''​.\\
 + --- //​[[hmeij@wesleyan.edu|Henk Meij]] 2007/04/19 15:27//
 +
 +====== HPLinpack Runs ======
 +
 +The purpose here is to rerun the HPLinpack benchmarks Amol ran while configuring the cluster.  ​
 +
 +^Before^
 +|{{:​cluster:​hplburn_before.gif|Idle!}}|
 +^During^
 +|{{:​cluster:​hplburn_during.gif|Heat!}}|
 +^Ooops^
 +|{{:​cluster:​hplburn.gif|Burn!}}|
 +
 +FAQ [[http://​www.netlib.org/​benchmark/​hpl/​faqs.html|External Link]]
 +
 +
 +
 +
 +====== Problem Sizes ======
 +
 +N calculation,​ for example: \\
 +4 nodes, 4 gb each is 16 gb total which yields 2 gb double precision (8 byte) elements ... 2gb is 2*1024*1024*1024 = 2,​147,​395,​600 ... take the square root of that and round 46,340 ... 80% of that is 37072\\
 +
 +N calculation 16 nodes (infiniband or ethernet):​\\
 +16 nodes, 4 gb each is 64 gb total which yields 8 gb double precision (8 byte) elements ... 8gb is 8*1024*1024*1024 = 8,​589,​934,​592 ... take the square root of that and round 92,681 ... 80% of that is 74145
 +
 +N calculation 4 heavy weight nodes:\\
 +4 nodes, 16 gb each is 64 gb total which yields 8 gb double precision (8 byte) elements ... 8gb is 8*1024*1024*1024 = 8,​589,​934,​592 ... take the square root of that and round 92,681 ... 80% of that is 74145
 +
 +NB calculations:​ \\
 +range of 32...256\\
 +ood starting values are 88 132
 +
 +PxQ Grid:\\
 +max value PxQ should equal nr of cores \\
 +P<Q ... close for infiniband but P much smaller than Q for ethernet
 +
 +LWNi (np=128): P=8, Q=16\\
 +LWNe (np=128): P=4, Q=32 or P=2, Q=64\\
 +HWN (np=32): P=4, Q=8 or P=2, Q=16
 +
 +===== Infiniband (16 nodes) =====
 +
 +  * nodes: compute-1-1 thru compute-1-16
 +  * each dual quad 2.6 ghz PE1950 (2x4x16 totals 128 cores)
 +  * each with 4 gb ram (4x16=64 gb total memory)
 +
 +===== HPL.dat ======
 +
 +<​code>​
 +HPLinpack benchmark input file
 +Innovative Computing Laboratory, University of Tennessee
 +HPL.out ​     output file name (if any)
 +7            device out (6=stdout,​7=stderr,​file)
 +8            # of problems sizes (N)
 +1000 5000 10000 15000 20000 25000 30000 35000 Ns
 +6            # of NBs
 +200 300 400 500 600 700     NBs
 +0            PMAP process mapping (0=Row-,​1=Column-major)
 +1            # of process grids (P x Q)
 +8            Ps
 +16           Qs
 +16.0         ​threshold
 +3            # of panel fact
 +0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
 +2            # of recursive stopping criterium
 +2 4          NBMINs (>= 1)
 +1            # of panels in recursion
 +2            NDIVs
 +3            # of recursive panel fact.
 +0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
 +1            # of broadcast
 +0            BCASTs (0=1rg,​1=1rM,​2=2rg,​3=2rM,​4=Lng,​5=LnM)
 +1            # of lookahead depth
 +0            DEPTHs (>=0)
 +2            SWAP (0=bin-exch,​1=long,​2=mix)
 +64           ​swapping threshold
 +0            L1 in (0=transposed,​1=no-transposed) form
 +0            U  in (0=transposed,​1=no-transposed) form
 +1            Equilibration (0=no,​1=yes)
 +8            memory alignment in double (> 0)
 +</​code>​
 +
 +===== run script ======
 +
 +**=> su delltest**
 +
 +<​code>​
 +#!/bin/bash
 +
 +echo "​setting P4_GLOBMEMSIZE=10000000"​
 +
 +P4_GLOBMEMSIZE=10000000
 +export P4_GLOBMEMSIZE
 +
 +echo "​invoking..."​
 +echo "/​usr/​local/​topspin/​mpi/​mpich/​bin/​mpirun_ssh -np 128 -hostfile ./machines ./​xhpl"​
 +
 +date > HPL.start
 +(/​usr/​local/​topspin/​mpi/​mpich/​bin/​mpirun_ssh -np 128 -hostfile ./machines ./xhpl > HPL.out 2>&​1) ​
 +</​code>​
 +
 +=> with above HPL.dat file, this configuration runs for 8 hours ...
 +
 +
 +===== Ethernet (16 nodes) =====
 +
 +  * nodes: compute-1-17 thru compute-2-32
 +  * each dual quad 2.6 ghz PE1950 (2x4x16 totals 128 cores)
 +  * each with 4 gb ram (4x16=64 gb total memory)
 +
 +===== HPL.dat ======
 +
 +<​code>​
 +Innovative Computing Laboratory, University of Tennessee
 +HPL.out ​     output file name (if any)
 +7            device out (6=stdout,​7=stderr,​file)
 +1             # of problems sizes (N)
 +74145 Ns
 +1             # of NBs
 +88 NBs
 +0            PMAP process mapping (0=Row-,​1=Column-major)
 +1            # of process grids (P x Q)
 +4            Ps
 +32           Qs
 +16.0         ​threshold
 +3            # of panel fact
 +0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
 +2            # of recursive stopping criterium
 +2 4          NBMINs (>= 1)
 +1            # of panels in recursion
 +2            NDIVs
 +3            # of recursive panel fact.
 +0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
 +1            # of broadcast
 +0            BCASTs (0=1rg,​1=1rM,​2=2rg,​3=2rM,​4=Lng,​5=LnM)
 +1            # of lookahead depth
 +0            DEPTHs (>=0)
 +2            SWAP (0=bin-exch,​1=long,​2=mix)
 +64           ​swapping threshold
 +0            L1 in (0=transposed,​1=no-transposed) form
 +0            U  in (0=transposed,​1=no-transposed) form
 +1            Equilibration (0=no,​1=yes)
 +8            memory alignment in double (> 0)
 +</​code>​
 +
 +
 +
 +
 +
 +===== run script ======
 +
 +**=> su delltest2**
 +
 +<​code>​
 +#!/bin/bash
 +
 +echo "​setting P4_GLOBMEMSIZE=10000000"​
 +
 +P4_GLOBMEMSIZE=10000000
 +export P4_GLOBMEMSIZE
 +
 +date > HPL.start
 +
 +
 +echo "​invoking..."​
 +echo "/​home/​delltest2/​openmpi-1.2/​bin/​mpirun -np 128 -machinefile ​
 +/​home/​delltest2/​machines /​home/​delltest2/​xhpl"​
 +
 +(/​home/​delltest2/​openmpi-1.2/​bin/​mpirun -np 128 -machinefile ​
 +/​home/​delltest2/​machines home/​delltest2/​xhpl > /​home/​delltest2/​HPL.out 2>&​1)&​
 +</​code>​
 +
 +=> runs for 4 hours ... change these lines below and it'll run for 14 hours
 +
 +<​code>​
 +2             # of problems sizes (N)
 +74145 74145 Ns
 +2             # of NBs
 +88 132 NBs
 +</​code>​
 +
 +
 +===== Ethernet (4 nodes) =====
 +
 +
 +  * nodes: nfs-2-1 thru nfs-2-4
 +  * each dual quad 2.6 ghz PE1950 (2x4x4 totals 32 cores)
 +  * each with 16 gb ram (16x4=64 gb total memory)
 +
 +===== HPL.dat ======
 +
 +<​code>​
 +Innovative Computing Laboratory, University of Tennessee
 +HPL.out ​     output file name (if any)
 +7            device out (6=stdout,​7=stderr,​file)
 +1            # of problems sizes (N)
 +74145 Ns
 +1            # of NBs
 +88 NBs
 +0            PMAP process mapping (0=Row-,​1=Column-major)
 +1            # of process grids (P x Q)
 +4            Ps
 +8            Qs
 +16.0         ​threshold
 +3            # of panel fact
 +0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
 +2            # of recursive stopping criterium
 +2 4          NBMINs (>= 1)
 +1            # of panels in recursion
 +2            NDIVs
 +3            # of recursive panel fact.
 +0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
 +1            # of broadcast
 +0            BCASTs (0=1rg,​1=1rM,​2=2rg,​3=2rM,​4=Lng,​5=LnM)
 +1            # of lookahead depth
 +0            DEPTHs (>=0)
 +2            SWAP (0=bin-exch,​1=long,​2=mix)
 +64           ​swapping threshold
 +0            L1 in (0=transposed,​1=no-transposed) form
 +0            U  in (0=transposed,​1=no-transposed) form
 +1            Equilibration (0=no,​1=yes)
 +8            memory alignment in double (> 0)
 +</​code>​
 +
 +
 +
 +===== run script ======
 +
 +**=> su delltest3**
 +
 +<​code>​
 +#!/bin/bash
 +
 +echo "​setting P4_GLOBMEMSIZE=10000000"​
 +
 +P4_GLOBMEMSIZE=10000000
 +export P4_GLOBMEMSIZE
 +
 +date > HPL.start
 +
 +
 +echo "​invoking..."​
 +echo "/​home/​delltest3/​openmpi-1.2/​bin/​mpirun -np 32 -machinefile ​
 + /​home/​delltest3/​machines /​home/​delltest3/​xhpl"​
 +
 +(/​home/​delltest3/​openmpi-1.2/​bin/​mpirun -np 32 -machinefile /​home/​delltest3/​machines
 + /​home/​delltest3/​xhpl > /​home/​delltest3/​HPL.out 2>&​1)&​
 +</​code>​
 +
 +=> runs for 14 1/2 hours ... change these lines and it'll run for 2 days
 +
 +<​code>​
 +2             # of NBs
 +88 132 NBs
 +</​code>​
 +
 +
 +===== MPIRUN-1.2 =====
 +
 +From Sili at Platform ...
 +
 +<​code>​
 +Actrually, MPICH 1 always has problems in the shared memory control. It really takes time to debug on the buggy shared memory stuff. I would rather suggest using openmpi instead of MPICH 1 to launch Ethernet linpack testings as openmpi is a newer and better MPI implementation than MPICH 1 and it is MPI-2 compatible plus it supports for both ethernet and infiniband devices.
 + 
 +The precedure I just tested is as follows
 +
 +1. Compile Openmpi
 +Here is the procedure I used to recompile openmpi :
 +# ./configure --prefix=/​home/​shuang/​openmpi-1.2 --disable-mpi-f90
 +# make
 +# make install
 + 
 +To test the installation,​ create a host file. I generated a hostfile :
 +# cat /etc/hosts | grep compute | awk '​{print $3} ' > machines
 + 
 +Then I recompiled the hello example (the hello_c.c file can be found at the examples directory on the untar'​d source directory):
 +# /​home/​shuang/​openmpi-1.2/​bin/​mpicc -o hello ./hello_c.c
 + 
 +And tested it :
 +# /​home/​shuang/​openmpi-1.2/​bin/​mpirun -np 4 -machinefile machines --prefix /​home/​shuang/​openmpi-1.2 ./hello
 + 
 +Please note that I used the complete path to the executables because by default, lam will be picked up. This is also why I used the --prefix option. You may want to use modules to load / unload these environment settings. Please let me know if you would like to have more information about this (open-source) software.
 + 
 +2. Compile Linpack with Openmpi
 + 
 +# wget http://​www.netlib.org/​benchmark/​hpl/​hpl.tgz
 +# tar zxf hpl.tgz
 +# cd hpl
 +# cp setup/​Make.Linux_PII_CBLAS .
 +edit Make.Linux_PII_CBLAS,​ change "​MPdir"​ to "/​home/​shuang/​openmpi-1.2",​ change "​MPlib"​ to "​$(MPdir)/​lib/​libmpi.so",​ change "​LAdir"​ to "/​usr/​lib64",​ change "​CC"​ to "/​home/​shuang/​openmpi-1.2/​bin/​mpicc",​ and change "​LINKER"​ to "/​home/​shuang/​openmpi-1.2/​bin/​mpicc"​.
 + 
 +Then you can make linpack by
 +# make arch=Linux_PII_CBLAS
 + 
 +To test it, edit the HPL.dat accordingly and run by:
 +# /​home/​shuang/​openmpi-1.2/​bin/​mpirun -np 8 -machinefile machines --prefix /​home/​shuang/​openmpi-1.2 ./xhpl
 +
 +</​code>​
 +
 +===== MPIRUN-1.2 (fixes) =====
 +
 +My experience ...
 +
 +<​code>​
 +
 +source is in /​mnt/​src/​hmeij-tmp/​foodir/​src
 +
 +su delltest3
 +
 +export LD_LIBRARY_PATH="/​home/​delletst3/​openmpi-1.2/​lib:​$LD_LIBRARY_PATH"​
 +add this to ~/.bashrc
 +
 +cd /​mnt/​src/​hmeij-tmp/​foodir/​src/​openmpi-1.2
 +./configure --prefix /​home/​delltest3/​openmpi-1.2 --disable-mpi-f90
 +make
 +make install
 +
 +cd ~\\
 +/​home/​delltest3/​openmpi-1.2/​bin/​mpicc -o hello \ 
 +/​mnt/​src/​hmeij-tmp/​foodir/​src/​openmpi-1.2/​examples/​hello_c.c
 +
 +the machines file setup does not like '​nfs-2-1:​8', ​
 +so instead addd 8 lines for each node like this '​nfs-2-1'​
 +
 +ldd hello
 +ldd openmpi-1.2/​bin/​mpirun
 +
 +test on a  single machine
 +/​home/​delltest3/​openmpi-1.2/​bin/​mpirun -np 8 -machinefile \
 +/​home/​delltest3/​machines ​ /​home/​delltest3/​hello
 +
 +cd ~
 +(for some reason you need to do this for compilation to be successful)
 +ln -s /​mnt/​src/​hmeij-tmp/​foodir/​src/​hpl
 +cd hpl
 +cp ~/​Make.Linux_PII_CBLAS .
 +make arch=Linux_PII_CBLAS  ​
 +cp bin/​Linux_PII_CBLAS/​xhpl ~
 +cp bin/​Linux_PII_CBLAS/​HPL.dat ~
 +
 +cd ~
 +/​home/​delltest3/​openmpi-1.2/​bin/​mpirun -np 8 -machinefile \
 +/​home/​delltest3/​machines /​home/​delltest3/​xhpl > HPL.out
 +
 +</​code>​
 +\\
 +**[[cluster:​23|Back]]**
cluster/26.txt ยท Last modified: 2007/04/19 15:28 (external edit)