Back
HP HPC
Notes for the cluster design conference with HP.
“do later” means we tackle after the HP on site visit.
S & H
Shipping Address: 5th floor data center
No 13'6“ truck, 12'6” is ok or box truck
Delivery on standard raised dock, no ways to lift rack out of truck if not docked
Freight Elevator and pallet jack available
Network
Basically …
x.y.z.255 is broadcast
x.y.z.254 is head or log in node
x.y.z.0 is gateway
x.y.z.<10 is for all switches (prefer 1) and console/management ports
x.y.z.10(up to 253) is for all compute nodes
We are planning to ingest our Dell cluster (37 nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach.
Netmask is, finally, 255.255.0.0 (excluding public 129.133 subnet).
Update with the following:
Hi Shanna, ok, i see that, so globally lets go with
eth0 192.168.102.x/255.255.0.0
eth1 10.10.102.x/255.255.0.0 (data, need to reach netapp filer at 10.10.0.y/255.255.0.0)
eth2 129.133.1.226 public (wesleyan.edu)
eth3 192.168.103.x/255.255.255.0 ipmi (or over eth0?)
eth4 192.168.104.x/255.255.255.0 ilo (or over eth0?)
ib0 10.11.103.x/255.255.255.0 ipoib (data)
ib1 10.11.104.x/255.255.255.0 ipoib (data, not used at the start)
where x=254 for head and x=10(increment by 1) for nodes n1-n32
does that work for you? i'm unsure how ilo/ipmi works but it could use eth0.
-Henk
Infiniband
HP Link
Configuration, fine tuning, identify bottlenecks, monitor, administer. Investigate Voltaire UFM?
DM380G7
HP Link (head node)
External Link video about hardware
hostname
greentail, another local “tail”, also in reference to HP being 18-24% more efficient in power/cooling
eth0, provision, 192.168.102.254/255.255.0.0 (greentail-eth0, should go to better switch ProCurve 2910)
eth1, data/private, 10.10.102.254/255.255.0.0 (greentail-eth1, should go to ProCurve 2610)
eth2, public, 129.133.1.226/255.255.255.0 (greentail.wesleyan.edu, we provide cable connection)
eth3 (over eth0), ipmi, 192.168.103.254/255.255.0.0, (greentail-ipmi, should go to better switch ProCurve 2910, do later)
ib0, ipoib, 10.10.103.254/255.255.0.0 (greentail-ib0)
ib1, ipoib, 10.10.104.254/255.255.0.0 (greentail-ib1, configure, might not have cables!, split traffic across ports?)
Raid 1 mirrored disks (2x250gb)
/home mount point for home directory volume ~ 10tb (contains /home/apps/src)
/snapshot mount point for snapshot volume ~ 10tb
/sanscratch mount point for sanscratch volume ~ 5 tb
/localscratch … maybe just a directory
logical volumes ROOT/VAR/BOOT/TMP: defaults
IPoIB configuration
SIM configuration
CMU configuration
SGE configuration
StorageWorks MSA60
SL2x170z G6
HP Link (compute nodes)
node names n0, increment by 1
eth0, provision, 192.168.102.10(increment by 1)/255.255.0.0 (n0-eth0, should go to better switch ProCurve 2910)
eth1, data/private, 10.10.102.10(increment by 1)/255.255.0.0 (n0-eth1, should go to ProCurve 2610)
eth2 (over eth0), ipmi, 192.168.103.10(increment by 1)/255.255.0.0, (n0-ipmi, should go to better switch ProCurve 2910, do later)
ib0, ipoib, 10.10.103.10(increment by 1)/255.255.0.0 (n0-ib0)
ib1, ipoib, 10.10.104.10(increment by 1)/255.255.0.0 (n0-ib1, configure, might not have cables!)
/home mount point for home directory volume ~ 10tb (contains /home/apps/src)
/snapshot mount point for snapshot volume ~ 10tb
/sanscratch mount point for sanscratch volume ~ 5 tb
(next ones must be 50% empty for cloning to work)
logical volume LOCALSCRATCH: mount at /localscratch ~ 100 gb (60 gb left for
OS)
logical volumes ROOT/VAR/BOOT/TMP: defaults
Misc
IPoIB
configuration, fine tune
monitor
Other
ToDo
All do later. After HP cluster is up.
Location
remove 2 BSS racks (to pace.edu?), rack #3 & 4
add an L6-30 if needed (have 3? check)
fill remaining 2 BSS racks with 24gb good servers, turn off
Back