This is an old revision of the document!
Back
HP HPC
Notes for the cluster design conference with HP.
“do later” means we tackle after the HP on site visit.
S & H
Shipping Address: 5th floor data center
No 13'6“ truck, 12'6” is ok or box truck
Delivery on standard raised dock, no ways to lift rack out of truck if not docked
Freight Elevator and pallet jack available
Network
Basically …
x.y.z.255 is broadcast
x.y.z.254 is head or log in node
x.y.z.0 is gateway
x.y.z.<25 is for all switches and console ports
x.y.z.25( up to 253) is for all compute nodes
We are planning to ingest our Dell cluster (37 nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach.
Netmask is, finally, 255.255.0.0 (excluding public 129.133 subnet).
DM380G7
HP Link (head node)
hostname
greentail, another local “tail”, also in reference to HP being 18-24% more efficient in power/cooling
eth0, provision, 192.168.102.254/255.255.0.0 (greentail-eth0, should go to better switch ProCurve 2910)
eth1, data/private, 10.10.102.254/255.255.0.0 (greentail-eth1, should go to ProCurve 2610)
eth2, public, 129.133.1.226/255.255.255.0 (greentail.wesleyan.edu, we provide cable connection)
eth3 (over eth2), ipmi, 192.168.103.254/255.255.0.0, (greentail-ipmi, should go to better switch ProCurve 2910, do later)
ib0, ipoib, 10.10.103.254/255.255.0.0 (greentail-ib0)
ib1, ipoib, 10.10.104.254/255.255.0.0 (greentail-ib1, configure, might not have cables!, split traffic across ports?)
Raid 1 mirrored disks (2x250gb)
/home mount point for home directory volume ~ 10tb
/home/apps mount point for software volume ~ 1tb (contains /home/apps/src)
/home/sanscratch mount point for sanscratch volume ~ 5 tb
logical volume LOCALSCRATCH: mount at /localscratch ~ 100 gb (should match nodes at 160 gb, leave rest for
OS)
logical volumes ROOT/VAR/BOOT/TMP: defaults
IPoIB configuration
SIM configuration
CMU configuration
SGE configuration
StorageWorks MSA60
SL2x170z G6
HP Link (compute nodes)
node names hp000, increment by 1
eth0, provision, 192.168.102.25(increment by 1)/255.255.0.0 (hp000-eth0, should go to better switch ProCurve 2910)
eth1, data/private, 10.10.102.25(increment by 1)/255.255.0.0 (hp000-eth1, should go to ProCurve 2610)
eth2 (over eth1), ipmi, 192.168.103.25(increment by 1)/255.255.0.0, (hp000-ipmi, should go to better switch ProCurve 2910, do later)
ib0, ipoib, 10.10.103.25(increment by 1)/255.255.0.0 (hp000-ib0)
ib1, ipoib, 10.10.104.25(increment by 1)/255.255.0.0 (hp000-ib1, configure, might not have cables!)
/home mount point for home directory volume ~ 10tb
/home/apps mount point for software volume ~ 1tb (contains /home/apps/src)
/home/sanscratch mount point for sanscratch volume ~ 5 tb
logical volume LOCALSCRATCH: mount at /localscratch ~ 100 gb (60 gb left for
OS)
logical volumes ROOT/VAR/BOOT/TMP: defaults
Misc
IPoIB
configuration, fine tune
monitor
Systems Insight Manager (SIM)
HP Link (Linux Install and Configure Guide, and User Guide)
Do we need a windows box (virtual) to run the Central Management Server on?
SIM + Cluster Monitor (MSCS)?
install, configure
requires an oracle install? no, hpsmdb is installed with automatic installation (postgresql)
linux deployment utilities, and management agents installation
configure managed systems, automatic discovery
configure automatic event handling
Cluster Management Utility (CMU)
HP Link (Getting Started - Hardware Preparation, Setup and Install – Installation Guide v4.2, Users Guides)
iLo/IPMI
HP iLo probably removes the need for IPMI, consult
External Link, do the blades have a management card?
well maybe not, IPMI (
External Link) can be scripted to power on/off, not sure about iLo (all web based)
is head node the Management server? possibly, needs access to provision and public networks
we may need a iLo eth? in range … 192.198.104.x? Consult the Hardware Preparation Guide.
CMU wants eth0 on NIC1 and PXEboot
install CMU management node
install X and CMU
GUI client node
start CMU, start client, scan for nodes, build golden image
clone nodes, deploy management agent on nodes
install monitoring
Other
ToDo
All do later. After HP cluster is up.
Back