\\ **[[cluster:23|Back]]** The production copy of OpenMPI is in ''/share/apps/openmpi-1.2''.\\ --- //[[hmeij@wesleyan.edu|Henk Meij]] 2007/04/19 15:27// ====== HPLinpack Runs ====== The purpose here is to rerun the HPLinpack benchmarks Amol ran while configuring the cluster. ^Before^ |{{:cluster:hplburn_before.gif|Idle!}}| ^During^ |{{:cluster:hplburn_during.gif|Heat!}}| ^Ooops^ |{{:cluster:hplburn.gif|Burn!}}| FAQ [[http://www.netlib.org/benchmark/hpl/faqs.html|External Link]] ====== Problem Sizes ====== N calculation, for example: \\ 4 nodes, 4 gb each is 16 gb total which yields 2 gb double precision (8 byte) elements ... 2gb is 2*1024*1024*1024 = 2,147,395,600 ... take the square root of that and round 46,340 ... 80% of that is 37072\\ N calculation 16 nodes (infiniband or ethernet):\\ 16 nodes, 4 gb each is 64 gb total which yields 8 gb double precision (8 byte) elements ... 8gb is 8*1024*1024*1024 = 8,589,934,592 ... take the square root of that and round 92,681 ... 80% of that is 74145 N calculation 4 heavy weight nodes:\\ 4 nodes, 16 gb each is 64 gb total which yields 8 gb double precision (8 byte) elements ... 8gb is 8*1024*1024*1024 = 8,589,934,592 ... take the square root of that and round 92,681 ... 80% of that is 74145 NB calculations: \\ range of 32...256\\ ood starting values are 88 132 PxQ Grid:\\ max value PxQ should equal nr of cores \\ P HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 7 device out (6=stdout,7=stderr,file) 8 # of problems sizes (N) 1000 5000 10000 15000 20000 25000 30000 35000 Ns 6 # of NBs 200 300 400 500 600 700 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 8 Ps 16 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ===== run script ====== **=> su delltest** #!/bin/bash echo "setting P4_GLOBMEMSIZE=10000000" P4_GLOBMEMSIZE=10000000 export P4_GLOBMEMSIZE echo "invoking..." echo "/usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 128 -hostfile ./machines ./xhpl" date > HPL.start (/usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 128 -hostfile ./machines ./xhpl > HPL.out 2>&1) => with above HPL.dat file, this configuration runs for 8 hours ... ===== Ethernet (16 nodes) ===== * nodes: compute-1-17 thru compute-2-32 * each dual quad 2.6 ghz PE1950 (2x4x16 totals 128 cores) * each with 4 gb ram (4x16=64 gb total memory) ===== HPL.dat ====== Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 7 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 74145 Ns 1 # of NBs 88 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 4 Ps 32 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ===== run script ====== **=> su delltest2** #!/bin/bash echo "setting P4_GLOBMEMSIZE=10000000" P4_GLOBMEMSIZE=10000000 export P4_GLOBMEMSIZE date > HPL.start echo "invoking..." echo "/home/delltest2/openmpi-1.2/bin/mpirun -np 128 -machinefile /home/delltest2/machines /home/delltest2/xhpl" (/home/delltest2/openmpi-1.2/bin/mpirun -np 128 -machinefile /home/delltest2/machines home/delltest2/xhpl > /home/delltest2/HPL.out 2>&1)& => runs for 4 hours ... change these lines below and it'll run for 14 hours 2 # of problems sizes (N) 74145 74145 Ns 2 # of NBs 88 132 NBs ===== Ethernet (4 nodes) ===== * nodes: nfs-2-1 thru nfs-2-4 * each dual quad 2.6 ghz PE1950 (2x4x4 totals 32 cores) * each with 16 gb ram (16x4=64 gb total memory) ===== HPL.dat ====== Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 7 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 74145 Ns 1 # of NBs 88 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 4 Ps 8 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ===== run script ====== **=> su delltest3** #!/bin/bash echo "setting P4_GLOBMEMSIZE=10000000" P4_GLOBMEMSIZE=10000000 export P4_GLOBMEMSIZE date > HPL.start echo "invoking..." echo "/home/delltest3/openmpi-1.2/bin/mpirun -np 32 -machinefile /home/delltest3/machines /home/delltest3/xhpl" (/home/delltest3/openmpi-1.2/bin/mpirun -np 32 -machinefile /home/delltest3/machines /home/delltest3/xhpl > /home/delltest3/HPL.out 2>&1)& => runs for 14 1/2 hours ... change these lines and it'll run for 2 days 2 # of NBs 88 132 NBs ===== MPIRUN-1.2 ===== From Sili at Platform ... Actrually, MPICH 1 always has problems in the shared memory control. It really takes time to debug on the buggy shared memory stuff. I would rather suggest using openmpi instead of MPICH 1 to launch Ethernet linpack testings as openmpi is a newer and better MPI implementation than MPICH 1 and it is MPI-2 compatible plus it supports for both ethernet and infiniband devices. The precedure I just tested is as follows 1. Compile Openmpi Here is the procedure I used to recompile openmpi : # ./configure --prefix=/home/shuang/openmpi-1.2 --disable-mpi-f90 # make # make install To test the installation, create a host file. I generated a hostfile : # cat /etc/hosts | grep compute | awk '{print $3} ' > machines Then I recompiled the hello example (the hello_c.c file can be found at the examples directory on the untar'd source directory): # /home/shuang/openmpi-1.2/bin/mpicc -o hello ./hello_c.c And tested it : # /home/shuang/openmpi-1.2/bin/mpirun -np 4 -machinefile machines --prefix /home/shuang/openmpi-1.2 ./hello Please note that I used the complete path to the executables because by default, lam will be picked up. This is also why I used the --prefix option. You may want to use modules to load / unload these environment settings. Please let me know if you would like to have more information about this (open-source) software. 2. Compile Linpack with Openmpi # wget http://www.netlib.org/benchmark/hpl/hpl.tgz # tar zxf hpl.tgz # cd hpl # cp setup/Make.Linux_PII_CBLAS . edit Make.Linux_PII_CBLAS, change "MPdir" to "/home/shuang/openmpi-1.2", change "MPlib" to "$(MPdir)/lib/libmpi.so", change "LAdir" to "/usr/lib64", change "CC" to "/home/shuang/openmpi-1.2/bin/mpicc", and change "LINKER" to "/home/shuang/openmpi-1.2/bin/mpicc". Then you can make linpack by # make arch=Linux_PII_CBLAS To test it, edit the HPL.dat accordingly and run by: # /home/shuang/openmpi-1.2/bin/mpirun -np 8 -machinefile machines --prefix /home/shuang/openmpi-1.2 ./xhpl ===== MPIRUN-1.2 (fixes) ===== My experience ... source is in /mnt/src/hmeij-tmp/foodir/src su delltest3 export LD_LIBRARY_PATH="/home/delletst3/openmpi-1.2/lib:$LD_LIBRARY_PATH" add this to ~/.bashrc cd /mnt/src/hmeij-tmp/foodir/src/openmpi-1.2 ./configure --prefix /home/delltest3/openmpi-1.2 --disable-mpi-f90 make make install cd ~\\ /home/delltest3/openmpi-1.2/bin/mpicc -o hello \ /mnt/src/hmeij-tmp/foodir/src/openmpi-1.2/examples/hello_c.c the machines file setup does not like 'nfs-2-1:8', so instead addd 8 lines for each node like this 'nfs-2-1' ldd hello ldd openmpi-1.2/bin/mpirun test on a single machine /home/delltest3/openmpi-1.2/bin/mpirun -np 8 -machinefile \ /home/delltest3/machines /home/delltest3/hello cd ~ (for some reason you need to do this for compilation to be successful) ln -s /mnt/src/hmeij-tmp/foodir/src/hpl cd hpl cp ~/Make.Linux_PII_CBLAS . make arch=Linux_PII_CBLAS cp bin/Linux_PII_CBLAS/xhpl ~ cp bin/Linux_PII_CBLAS/HPL.dat ~ cd ~ /home/delltest3/openmpi-1.2/bin/mpirun -np 8 -machinefile \ /home/delltest3/machines /home/delltest3/xhpl > HPL.out \\ **[[cluster:23|Back]]**