The production copy of OpenMPI is in /share/apps/openmpi-1.2
.
— Henk Meij 2007/04/19 15:27
The purpose here is to rerun the HPLinpack benchmarks Amol ran while configuring the cluster.
FAQ External Link
N calculation, for example:
4 nodes, 4 gb each is 16 gb total which yields 2 gb double precision (8 byte) elements … 2gb is 2*1024*1024*1024 = 2,147,395,600 … take the square root of that and round 46,340 … 80% of that is 37072
N calculation 16 nodes (infiniband or ethernet):
16 nodes, 4 gb each is 64 gb total which yields 8 gb double precision (8 byte) elements … 8gb is 8*1024*1024*1024 = 8,589,934,592 … take the square root of that and round 92,681 … 80% of that is 74145
N calculation 4 heavy weight nodes:
4 nodes, 16 gb each is 64 gb total which yields 8 gb double precision (8 byte) elements … 8gb is 8*1024*1024*1024 = 8,589,934,592 … take the square root of that and round 92,681 … 80% of that is 74145
NB calculations:
range of 32…256
ood starting values are 88 132
PxQ Grid:
max value PxQ should equal nr of cores
P<Q … close for infiniband but P much smaller than Q for ethernet
LWNi (np=128): P=8, Q=16
LWNe (np=128): P=4, Q=32 or P=2, Q=64
HWN (np=32): P=4, Q=8 or P=2, Q=16
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 7 device out (6=stdout,7=stderr,file) 8 # of problems sizes (N) 1000 5000 10000 15000 20000 25000 30000 35000 Ns 6 # of NBs 200 300 400 500 600 700 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 8 Ps 16 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0)
⇒ su delltest
#!/bin/bash echo "setting P4_GLOBMEMSIZE=10000000" P4_GLOBMEMSIZE=10000000 export P4_GLOBMEMSIZE echo "invoking..." echo "/usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 128 -hostfile ./machines ./xhpl" date > HPL.start (/usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 128 -hostfile ./machines ./xhpl > HPL.out 2>&1)
⇒ with above HPL.dat file, this configuration runs for 8 hours …
Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 7 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 74145 Ns 1 # of NBs 88 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 4 Ps 32 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0)
⇒ su delltest2
#!/bin/bash echo "setting P4_GLOBMEMSIZE=10000000" P4_GLOBMEMSIZE=10000000 export P4_GLOBMEMSIZE date > HPL.start echo "invoking..." echo "/home/delltest2/openmpi-1.2/bin/mpirun -np 128 -machinefile /home/delltest2/machines /home/delltest2/xhpl" (/home/delltest2/openmpi-1.2/bin/mpirun -np 128 -machinefile /home/delltest2/machines home/delltest2/xhpl > /home/delltest2/HPL.out 2>&1)&
⇒ runs for 4 hours … change these lines below and it'll run for 14 hours
2 # of problems sizes (N) 74145 74145 Ns 2 # of NBs 88 132 NBs
Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 7 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 74145 Ns 1 # of NBs 88 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 4 Ps 8 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0)
⇒ su delltest3
#!/bin/bash echo "setting P4_GLOBMEMSIZE=10000000" P4_GLOBMEMSIZE=10000000 export P4_GLOBMEMSIZE date > HPL.start echo "invoking..." echo "/home/delltest3/openmpi-1.2/bin/mpirun -np 32 -machinefile /home/delltest3/machines /home/delltest3/xhpl" (/home/delltest3/openmpi-1.2/bin/mpirun -np 32 -machinefile /home/delltest3/machines /home/delltest3/xhpl > /home/delltest3/HPL.out 2>&1)&
⇒ runs for 14 1/2 hours … change these lines and it'll run for 2 days
2 # of NBs 88 132 NBs
From Sili at Platform …
Actrually, MPICH 1 always has problems in the shared memory control. It really takes time to debug on the buggy shared memory stuff. I would rather suggest using openmpi instead of MPICH 1 to launch Ethernet linpack testings as openmpi is a newer and better MPI implementation than MPICH 1 and it is MPI-2 compatible plus it supports for both ethernet and infiniband devices. The precedure I just tested is as follows 1. Compile Openmpi Here is the procedure I used to recompile openmpi : # ./configure --prefix=/home/shuang/openmpi-1.2 --disable-mpi-f90 # make # make install To test the installation, create a host file. I generated a hostfile : # cat /etc/hosts | grep compute | awk '{print $3} ' > machines Then I recompiled the hello example (the hello_c.c file can be found at the examples directory on the untar'd source directory): # /home/shuang/openmpi-1.2/bin/mpicc -o hello ./hello_c.c And tested it : # /home/shuang/openmpi-1.2/bin/mpirun -np 4 -machinefile machines --prefix /home/shuang/openmpi-1.2 ./hello Please note that I used the complete path to the executables because by default, lam will be picked up. This is also why I used the --prefix option. You may want to use modules to load / unload these environment settings. Please let me know if you would like to have more information about this (open-source) software. 2. Compile Linpack with Openmpi # wget http://www.netlib.org/benchmark/hpl/hpl.tgz # tar zxf hpl.tgz # cd hpl # cp setup/Make.Linux_PII_CBLAS . edit Make.Linux_PII_CBLAS, change "MPdir" to "/home/shuang/openmpi-1.2", change "MPlib" to "$(MPdir)/lib/libmpi.so", change "LAdir" to "/usr/lib64", change "CC" to "/home/shuang/openmpi-1.2/bin/mpicc", and change "LINKER" to "/home/shuang/openmpi-1.2/bin/mpicc". Then you can make linpack by # make arch=Linux_PII_CBLAS To test it, edit the HPL.dat accordingly and run by: # /home/shuang/openmpi-1.2/bin/mpirun -np 8 -machinefile machines --prefix /home/shuang/openmpi-1.2 ./xhpl
My experience …
source is in /mnt/src/hmeij-tmp/foodir/src su delltest3 export LD_LIBRARY_PATH="/home/delletst3/openmpi-1.2/lib:$LD_LIBRARY_PATH" add this to ~/.bashrc cd /mnt/src/hmeij-tmp/foodir/src/openmpi-1.2 ./configure --prefix /home/delltest3/openmpi-1.2 --disable-mpi-f90 make make install cd ~\\ /home/delltest3/openmpi-1.2/bin/mpicc -o hello \ /mnt/src/hmeij-tmp/foodir/src/openmpi-1.2/examples/hello_c.c the machines file setup does not like 'nfs-2-1:8', so instead addd 8 lines for each node like this 'nfs-2-1' ldd hello ldd openmpi-1.2/bin/mpirun test on a single machine /home/delltest3/openmpi-1.2/bin/mpirun -np 8 -machinefile \ /home/delltest3/machines /home/delltest3/hello cd ~ (for some reason you need to do this for compilation to be successful) ln -s /mnt/src/hmeij-tmp/foodir/src/hpl cd hpl cp ~/Make.Linux_PII_CBLAS . make arch=Linux_PII_CBLAS cp bin/Linux_PII_CBLAS/xhpl ~ cp bin/Linux_PII_CBLAS/HPL.dat ~ cd ~ /home/delltest3/openmpi-1.2/bin/mpirun -np 8 -machinefile \ /home/delltest3/machines /home/delltest3/xhpl > HPL.out