This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:89 [2010/08/14 01:42] hmeij |
cluster:89 [2010/08/18 22:07] hmeij |
||
---|---|---|---|
Line 14: | Line 14: | ||
* Delivery on standard raised dock, no ways to lift rack out of truck if not docked | * Delivery on standard raised dock, no ways to lift rack out of truck if not docked | ||
* Freight Elevator and pallet jack available | * Freight Elevator and pallet jack available | ||
+ | |||
+ | ===== Network ===== | ||
+ | |||
+ | Basically ... | ||
+ | |||
+ | * configure all console port switches with an IP | ||
+ | * depending on switch IP in 192.168.102.x or 10.10.102.x | ||
+ | * voltaire console can be stuffed in either | ||
+ | |||
+ | * head node will be connected to our private network via a two link aggregated ethernet cables in the 10.10.x.y range so current home directories can be mounted somewhere (these dirs will not be available on the back end nodes. | ||
+ | |||
+ | * x.y.z.255 is broadcast | ||
+ | * x.y.z.254 is head or log in node | ||
+ | * x.y.z.0 is gateway | ||
+ | * x.y.z.< | ||
+ | * x.y.z.25( up to 253) is for all compute nodes | ||
+ | |||
+ | We are planning to ingest our Dell cluster (37 nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach. | ||
+ | |||
+ | Netmask is, finally, 255.255.0.0 (excluding public 129.133 subnet). | ||
===== DM380G7 ===== | ===== DM380G7 ===== | ||
Line 20: | Line 40: | ||
* Dual power (one to UPS, one to utility, do later) | * Dual power (one to UPS, one to utility, do later) | ||
- | * hostname [[http:// | + | * hostname [[http:// |
* eth0, provision, 192.168.102.254/ | * eth0, provision, 192.168.102.254/ | ||
+ | * do we need a iLo eth? in range 192.168.104.254? | ||
* eth1, data/ | * eth1, data/ | ||
- | * eth2, public, 129.133.1.226/ | + | * eth2, public, 129.133.1.226/ |
- | * eth3, ipmi, 10.10.103.254/ | + | * eth3 (over eth2), ipmi, 192.168.103.254/ |
- | * ib0, ipoib, 10.10.104.254/ | + | * see discussion iLo/IPMI under CMU |
- | * ib1, ipoib, 10.10.105.254/ | + | * ib0, ipoib, 10.10.103.254/ |
+ | * ib1, ipoib, 10.10.104.254/ | ||
* Raid 1 mirrored disks (2x250gb) | * Raid 1 mirrored disks (2x250gb) | ||
- | * /home mount point for home directory volume ~ 10tb | + | * /home mount point for home directory volume ~ 10tb (contains / |
- | * /home/apps mount point for software volume ~ 1tb (contains / | + | * /snapshot mount point for snapshot volume ~ 10tb |
- | * /home/sanscratch mount point for sanscratch volume ~ 5 tb | + | * /sanscratch mount point for sanscratch volume ~ 5 tb |
* logical volume LOCALSCRATCH: | * logical volume LOCALSCRATCH: | ||
* logical volumes ROOT/ | * logical volumes ROOT/ | ||
+ | |||
+ | * IPoIB configuration | ||
+ | * SIM configuration | ||
+ | * CMU configuration | ||
+ | * SGE configuration | ||
===== StorageWorks MSA60 ===== | ===== StorageWorks MSA60 ===== | ||
Line 41: | Line 68: | ||
* Three volumes to start with: | * Three volumes to start with: | ||
- | * home (raid 6, design a backup path, do later), 10 tb | + | * home (raid 6), 10 tb |
- | * apps (raid 6, design a backup path, do later), 1tb | + | * snapshot |
- | * sanscratch (raid 1, no backup), 5 tb | + | * sanscratch (raid 1 or 0, no backup), 5 tb |
- | * Systems Insight Manager (SIM) | + | * SIM |
- | * install, configure, monitor | + | |
- | * event actions | + | |
- | |||
- | ===== Network ===== | ||
- | |||
- | Basically ... | ||
- | |||
- | * x.y.z.255 is broadcast | ||
- | * x.y.z.254 is head or log in node | ||
- | * x.y.z.0 is gateway | ||
- | * x.y.z.< | ||
- | * x.y.z.> | ||
- | |||
- | We are planning to ingest our Dell cluster (37nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach. | ||
===== SL2x170z G6 ===== | ===== SL2x170z G6 ===== | ||
Line 67: | Line 80: | ||
* node names hp000, increment by 1 | * node names hp000, increment by 1 | ||
* eth0, provision, 192.168.102.25(increment by 1)/ | * eth0, provision, 192.168.102.25(increment by 1)/ | ||
+ | * do we need an iLo eth? in range 192.168.104.25(increment by 1) | ||
+ | * CMU wants eth0 on NIC1 and PXEboot | ||
* eth1, data/ | * eth1, data/ | ||
- | * eth2, ipmi, 10.10.103.25(increment by 1)/ | + | * eth2 (over eth1), ipmi, 192.168.103.25(increment by 1)/ |
- | * ib0, ipoib, 10.10.104.25(increment by 1)/ | + | * see discussion iLo/IPMI under CMU |
- | * ib1, ipoib, 10.10.105.25(increment by 1)/ | + | * ib0, ipoib, 10.10.103.25(increment by 1)/ |
+ | * ib1, ipoib, 10.10.104.25(increment by 1)/ | ||
- | * /home mount point for home directory volume ~ 10tb | + | * /home mount point for home directory volume ~ 10tb (contains / |
- | * /home/apps mount point for software volume ~ 1tb (contains / | + | * /snapshot mount point for snapshot volume ~ 10tb |
- | * /home/sanscratch mount point for sanscratch volume ~ 5 tb | + | * /sanscratch mount point for sanscratch volume ~ 5 tb |
+ | * (next ones must be 50% empty for cloning to work) | ||
* logical volume LOCALSCRATCH: | * logical volume LOCALSCRATCH: | ||
* logical volumes ROOT/ | * logical volumes ROOT/ | ||
+ | * SIM | ||
===== Misc ===== | ===== Misc ===== | ||
Line 85: | Line 103: | ||
* monitor | * monitor | ||
- | * Cluster Management Utility (CMU) | + | |
- | * install, | + | * [[http:// |
- | * golden image capture, deploy | + | * Do we need a windows box (virtual) to run the Central Management Server on? |
+ | * SIM + Cluster Monitor (MSCS)? | ||
+ | * install, configure | ||
+ | * requires an oracle install? no, hpsmdb is installed with automatic installation (postgresql) | ||
+ | * linux deployment utilities, and management agents installation | ||
+ | * configure managed systems, automatic discovery | ||
+ | * configure automatic event handling | ||
+ | |||
+ | | ||
+ | * [[http:// | ||
+ | * HP iLo probably removes the need for IPMI, consult [[http:// | ||
+ | * well maybe not, IPMI ([[http:// | ||
+ | * is head node the Management server? possibly, needs access to provision and public networks | ||
+ | * we may need a iLo eth? in range ... 192.198.104.x? | ||
+ | * CMU wants eth0 on NIC1 and PXEboot | ||
+ | * install | ||
+ | * install X and CMU GUI client node | ||
+ | * start CMU, start client, scan for nodes, build golden image | ||
+ | * install monitoring client when building | ||
+ | * clone nodes, deploy | ||
+ | * not sure we can implement CMU HA | ||
* Sun Grid Engine (SGE) | * Sun Grid Engine (SGE) | ||
* install, configure | * install, configure | ||
* there will only be one queue (hp12) | * there will only be one queue (hp12) | ||
+ | |||
+ | ===== Other ===== | ||
* KVM utility | * KVM utility | ||
Line 99: | Line 139: | ||
* where in data center (do later), based on environmental works | * where in data center (do later), based on environmental works | ||
+ | ===== ToDo ===== | ||
+ | |||
+ | All do later. After HP cluster is up. | ||
+ | |||
+ | * Backups. | ||
+ | * Use trickery with linux and rsync to provide snapshots? [[http:// | ||
+ | * Exclude very large files? | ||
+ | * petaltail:/ | ||
+ | * or better [[http:// | ||
+ | |||
+ | * Lava. Install from source and evaluate. | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |