User Tools

Site Tools


cluster:89

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:89 [2010/08/19 11:20]
hmeij
cluster:89 [2010/11/22 14:05] (current)
hmeij
Line 28: Line 28:
   * x.y.z.254 is head or log in node   * x.y.z.254 is head or log in node
   * x.y.z.0 is gateway   * x.y.z.0 is gateway
-  * x.y.z.<25 is for all switches and console ports +  * x.y.z.<10 is for all switches (prefer 1) and console/management ports 
-  * x.y.z.25( up to 253) is for all compute nodes+  * x.y.z.10(up to 253) is for all compute nodes
  
 We are planning to ingest our Dell cluster (37 nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach. We are planning to ingest our Dell cluster (37 nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach.
  
 Netmask is, finally, 255.255.0.0 (excluding public 129.133 subnet). Netmask is, finally, 255.255.0.0 (excluding public 129.133 subnet).
 +
 +----
 +
 +Update with the following:
 +Hi Shanna, ok, i see that, so globally lets go with
 + 
 +eth0 192.168.102.x/255.255.0.0\\
 +eth1 10.10.102.x/255.255.0.0 (data, need to reach netapp filer at 10.10.0.y/255.255.0.0)\\
 +eth2 129.133.1.226 public (wesleyan.edu)\\
 +eth3 192.168.103.x/255.255.255.0 ipmi (or over eth0?)\\
 +eth4 192.168.104.x/255.255.255.0 ilo (or over eth0?)\\
 +ib0 10.11.103.x/255.255.255.0 ipoib (data)\\
 +ib1 10.11.104.x/255.255.255.0 ipoib (data, not used at the start)\\
 + 
 +where x=254 for head and x=10(increment by 1) for nodes n1-n32
 + 
 +does that work for you?  i'm unsure how ilo/ipmi works but it could use eth0.
 + 
 +-Henk
 +
 +
 +----
 +
 +
 +
 +===== Infiniband =====
 +
 +[[http://h20000.www2.hp.com/bizsupport/TechSupport/Home.jsp?lang=en&cc=vn&prodTypeId=12883&prodSeriesId=3758753&lang=en&cc=vn|HP Link]] 
 +
 +  * Voltaire 4036
 +  * 519571-B21
 +  * Voltaire InfiniBand 4X QDR 36-Port Managed Switch
 +
 +
 +Configuration, fine tuning, identify bottlenecks, monitor, administer.  Investigate [[http://www.voltaire.com/Products/Unified_Fabric_Manager|Voltaire UFM]]?
  
 ===== DM380G7 ===== ===== DM380G7 =====
-[[http://h10010.www1.hp.com/wwpc/us/en/sm/WF31a/15351-15351-3328412-241644-241475-4091412.html|HP Link]] (head node)+[[http://h10010.www1.hp.com/wwpc/us/en/sm/WF31a/15351-15351-3328412-241644-241475-4091412.html|HP Link]] (head node)\\ 
 +[[http://vimeo.com/9938744|External Link]] video about hardware
  
   * Dual power (one to UPS, one to utility, do later)   * Dual power (one to UPS, one to utility, do later)
Line 45: Line 81:
   * eth1, data/private, 10.10.102.254/255.255.0.0 (greentail-eth1, should go to ProCurve 2610)   * eth1, data/private, 10.10.102.254/255.255.0.0 (greentail-eth1, should go to ProCurve 2610)
   * eth2, public, 129.133.1.226/255.255.255.0 (greentail.wesleyan.edu, we provide cable connection)   * eth2, public, 129.133.1.226/255.255.255.0 (greentail.wesleyan.edu, we provide cable connection)
-  * eth3 (over eth2), ipmi, 192.168.103.254/255.255.0.0,  (greentail-ipmi, should go to better switch ProCurve 2910, do later)+  * eth3 (over eth0), ipmi, 192.168.103.254/255.255.0.0,  (greentail-ipmi, should go to better switch ProCurve 2910, do later)
     * see discussion iLo/IPMI under CMU     * see discussion iLo/IPMI under CMU
   * ib0, ipoib, 10.10.103.254/255.255.0.0 (greentail-ib0)   * ib0, ipoib, 10.10.103.254/255.255.0.0 (greentail-ib0)
Line 54: Line 90:
   * /snapshot mount point for snapshot volume ~ 10tb    * /snapshot mount point for snapshot volume ~ 10tb 
   * /sanscratch mount point for sanscratch volume ~ 5 tb   * /sanscratch mount point for sanscratch volume ~ 5 tb
-  * logical volume LOCALSCRATCH: mount at /localscratch ~ 100 gb (should match nodes at 160 gb, leave rest for OS)+  * /localscratch ... maybe just a directory
   * logical volumes ROOT/VAR/BOOT/TMP: defaults   * logical volumes ROOT/VAR/BOOT/TMP: defaults
  
Line 78: Line 114:
 [[http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01800572&prodTypeId=18964&prodSeriesId=489496|HP Link]] (compute nodes) [[http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01800572&prodTypeId=18964&prodSeriesId=489496|HP Link]] (compute nodes)
  
-    * node names hp000, increment by 1 +    * node names n0, increment by 1 
-    * eth0, provision, 192.168.102.25(increment by 1)/255.255.0.0 (hp000-eth0, should go to better switch ProCurve 2910) +    * eth0, provision, 192.168.102.10(increment by 1)/255.255.0.0 (n0-eth0, should go to better switch ProCurve 2910) 
-      * do we need an iLo eth? in range 192.168.104.25(increment by 1)+      * do we need an iLo eth? in range 192.168.104.10(increment by 1)
       * CMU wants eth0 on NIC1 and PXEboot       * CMU wants eth0 on NIC1 and PXEboot
-    * eth1, data/private, 10.10.102.25(increment by 1)/255.255.0.0 (hp000-eth1, should go to ProCurve 2610) +    * eth1, data/private, 10.10.102.10(increment by 1)/255.255.0.0 (n0-eth1, should go to ProCurve 2610) 
-    * eth2 (over eth1), ipmi, 192.168.103.25(increment by 1)/255.255.0.0, (hp000-ipmi, should go to better switch ProCurve 2910, do later)+    * eth2 (over eth0), ipmi, 192.168.103.10(increment by 1)/255.255.0.0, (n0-ipmi, should go to better switch ProCurve 2910, do later)
       * see discussion iLo/IPMI under CMU       * see discussion iLo/IPMI under CMU
-    * ib0, ipoib, 10.10.103.25(increment by 1)/255.255.0.0 (hp000-ib0) +    * ib0, ipoib, 10.10.103.10(increment by 1)/255.255.0.0 (n0-ib0) 
-    * ib1, ipoib, 10.10.104.25(increment by 1)/255.255.0.0 (hp000-ib1, configure, might not have cables!)+    * ib1, ipoib, 10.10.104.10(increment by 1)/255.255.0.0 (n0-ib1, configure, might not have cables!)
  
     * /home mount point for home directory volume ~ 10tb (contains /home/apps/src)     * /home mount point for home directory volume ~ 10tb (contains /home/apps/src)
Line 129: Line 165:
       * pre_reconf.sh (/localscratch partition? and reconf.sh (NIC2 definition)       * pre_reconf.sh (/localscratch partition? and reconf.sh (NIC2 definition)
     * not sure we can implement CMU HA     * not sure we can implement CMU HA
 +    * collectl/colplot seems nice
  
   * Sun Grid Engine (SGE)   * Sun Grid Engine (SGE)
Line 153: Line 190:
  
   * Lava.  Install from source and evaluate.   * Lava.  Install from source and evaluate.
 +
 +  * Location
 +    * remove 2 BSS racks (to pace.edu?), rack #3 & 4
 +    * add an L6-30 if needed (have 3? check)
 +    * fill remaining 2 BSS racks with 24gb good servers, turn off
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/89.1282231256.txt.gz · Last modified: 2010/08/19 11:20 by hmeij