This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cluster:89 [2010/08/18 21:39] hmeij |
cluster:89 [2010/11/22 19:05] (current) hmeij |
||
---|---|---|---|
Line 28: | Line 28: | ||
* x.y.z.254 is head or log in node | * x.y.z.254 is head or log in node | ||
* x.y.z.0 is gateway | * x.y.z.0 is gateway | ||
- | * x.y.z.<25 is for all switches and console ports | + | * x.y.z.<10 is for all switches |
- | * x.y.z.25( up to 253) is for all compute nodes | + | * x.y.z.10(up to 253) is for all compute nodes |
We are planning to ingest our Dell cluster (37 nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach. | We are planning to ingest our Dell cluster (37 nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach. | ||
Netmask is, finally, 255.255.0.0 (excluding public 129.133 subnet). | Netmask is, finally, 255.255.0.0 (excluding public 129.133 subnet). | ||
+ | |||
+ | ---- | ||
+ | |||
+ | Update with the following: | ||
+ | Hi Shanna, ok, i see that, so globally lets go with | ||
+ | |||
+ | eth0 192.168.102.x/ | ||
+ | eth1 10.10.102.x/ | ||
+ | eth2 129.133.1.226 public (wesleyan.edu)\\ | ||
+ | eth3 192.168.103.x/ | ||
+ | eth4 192.168.104.x/ | ||
+ | ib0 10.11.103.x/ | ||
+ | ib1 10.11.104.x/ | ||
+ | |||
+ | where x=254 for head and x=10(increment by 1) for nodes n1-n32 | ||
+ | |||
+ | does that work for you? i'm unsure how ilo/ipmi works but it could use eth0. | ||
+ | |||
+ | -Henk | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | |||
+ | ===== Infiniband ===== | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | * Voltaire 4036 | ||
+ | * 519571-B21 | ||
+ | * Voltaire InfiniBand 4X QDR 36-Port Managed Switch | ||
+ | |||
+ | |||
+ | Configuration, | ||
===== DM380G7 ===== | ===== DM380G7 ===== | ||
- | [[http:// | + | [[http:// |
+ | [[http:// | ||
* Dual power (one to UPS, one to utility, do later) | * Dual power (one to UPS, one to utility, do later) | ||
Line 45: | Line 81: | ||
* eth1, data/ | * eth1, data/ | ||
* eth2, public, 129.133.1.226/ | * eth2, public, 129.133.1.226/ | ||
- | * eth3 (over eth2), ipmi, 192.168.103.254/ | + | * eth3 (over eth0), ipmi, 192.168.103.254/ |
* see discussion iLo/IPMI under CMU | * see discussion iLo/IPMI under CMU | ||
* ib0, ipoib, 10.10.103.254/ | * ib0, ipoib, 10.10.103.254/ | ||
Line 54: | Line 90: | ||
* /snapshot mount point for snapshot volume ~ 10tb | * /snapshot mount point for snapshot volume ~ 10tb | ||
* /sanscratch mount point for sanscratch volume ~ 5 tb | * /sanscratch mount point for sanscratch volume ~ 5 tb | ||
- | * logical volume LOCALSCRATCH: | + | * / |
* logical volumes ROOT/ | * logical volumes ROOT/ | ||
Line 78: | Line 114: | ||
[[http:// | [[http:// | ||
- | * node names hp000, increment by 1 | + | * node names n0, increment by 1 |
- | * eth0, provision, 192.168.102.25(increment by 1)/ | + | * eth0, provision, 192.168.102.10(increment by 1)/ |
- | * do we need an iLo eth? in range 192.168.104.25(increment by 1) | + | * do we need an iLo eth? in range 192.168.104.10(increment by 1) |
* CMU wants eth0 on NIC1 and PXEboot | * CMU wants eth0 on NIC1 and PXEboot | ||
- | * eth1, data/ | + | * eth1, data/ |
- | * eth2 (over eth1), ipmi, 192.168.103.25(increment by 1)/ | + | * eth2 (over eth0), ipmi, 192.168.103.10(increment by 1)/ |
* see discussion iLo/IPMI under CMU | * see discussion iLo/IPMI under CMU | ||
- | * ib0, ipoib, 10.10.103.25(increment by 1)/ | + | * ib0, ipoib, 10.10.103.10(increment by 1)/ |
- | * ib1, ipoib, 10.10.104.25(increment by 1)/ | + | * ib1, ipoib, 10.10.104.10(increment by 1)/ |
* /home mount point for home directory volume ~ 10tb (contains / | * /home mount point for home directory volume ~ 10tb (contains / | ||
* /snapshot mount point for snapshot volume ~ 10tb | * /snapshot mount point for snapshot volume ~ 10tb | ||
* /sanscratch mount point for sanscratch volume ~ 5 tb | * /sanscratch mount point for sanscratch volume ~ 5 tb | ||
+ | * (next ones must be 50% empty for cloning to work) | ||
* logical volume LOCALSCRATCH: | * logical volume LOCALSCRATCH: | ||
* logical volumes ROOT/ | * logical volumes ROOT/ | ||
Line 115: | Line 152: | ||
* [[http:// | * [[http:// | ||
* HP iLo probably removes the need for IPMI, consult [[http:// | * HP iLo probably removes the need for IPMI, consult [[http:// | ||
- | | + | |
+ | * hmm, we can power up/off via CMU so perhaps IPMI is not needed nor this ability via SIM and web browser | ||
* is head node the Management server? possibly, needs access to provision and public networks | * is head node the Management server? possibly, needs access to provision and public networks | ||
* we may need a iLo eth? in range ... 192.198.104.x? | * we may need a iLo eth? in range ... 192.198.104.x? | ||
Line 124: | Line 162: | ||
* install monitoring client when building golden image node via CMU GUI | * install monitoring client when building golden image node via CMU GUI | ||
* clone nodes, deploy management agent on nodes | * clone nodes, deploy management agent on nodes | ||
+ | * PXEboot and wake-on-lan must be done manually in BIOS | ||
+ | * pre_reconf.sh (/ | ||
* not sure we can implement CMU HA | * not sure we can implement CMU HA | ||
+ | * collectl/ | ||
* Sun Grid Engine (SGE) | * Sun Grid Engine (SGE) | ||
Line 149: | Line 190: | ||
* Lava. Install from source and evaluate. | * Lava. Install from source and evaluate. | ||
+ | |||
+ | * Location | ||
+ | * remove 2 BSS racks (to pace.edu?), rack #3 & 4 | ||
+ | * add an L6-30 if needed (have 3? check) | ||
+ | * fill remaining 2 BSS racks with 24gb good servers, turn off | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |