User Tools

Site Tools


cluster:89

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:89 [2010/08/12 15:53]
hmeij
cluster:89 [2010/08/17 15:22]
hmeij
Line 5: Line 5:
  
 Notes for the cluster design conference with HP. Notes for the cluster design conference with HP.
 +
 +"do later" means we tackle after the HP on site visit.
  
 ===== S & H ===== ===== S & H =====
Line 10: Line 12:
   * Shipping Address: 5th floor data center   * Shipping Address: 5th floor data center
   * No 13'6" truck, 12'6" is ok or box truck   * No 13'6" truck, 12'6" is ok or box truck
-  * Delivery on standard raised dock, no ways to lift out of truck+  * Delivery on standard raised dock, no ways to lift rack out of truck if not docked
   * Freight Elevator and pallet jack available   * Freight Elevator and pallet jack available
  
-===== Head Node =====+===== Network =====
  
-  * Dual power (one to UPS, one to utility)+Basically ...
  
-  * hostname greentail +  * x.y.z.255 is broadcast 
-  * eth0, provision, 192.168.102.254/255.255.0.0 (greentail-eth0) +  * x.y.z.254 is head or log in node 
-  * eth1, data/private, 10.10.102.254/255.255.0.0 (greentail-eth1)+  * x.y.z.0 is gateway 
 +  * x.y.z.<25 is for all switches and console ports 
 +  * x.y.z.25( up to 253) is for all compute nodes 
 + 
 +We are planning to ingest our Dell cluster (37 nodes) and our Blue Sky Studios cluster (130 nodes) into this setup, hence the approach. 
 + 
 +Netmask is, finally, 255.255.0.0 (excluding public 129.133 subnet). 
 + 
 +===== DM380G7 ===== 
 +[[http://h10010.www1.hp.com/wwpc/us/en/sm/WF31a/15351-15351-3328412-241644-241475-4091412.html|HP Link]] (head node) 
 + 
 +  * Dual power (one to UPS, one to utility, do later) 
 + 
 +  * hostname [[http://www.ct.gov/dep/cwp/view.asp?A=2723&Q=325780|greentail]], another local "tail", also in reference to HP being 18-24% more efficient in power/cooling 
 +  * eth0, provision, 192.168.102.254/255.255.0.0 (greentail-eth0, should go to better switch ProCurve 2910
 +  * eth1, data/private, 10.10.102.254/255.255.0.0 (greentail-eth1, should go to ProCurve 2610)
   * eth2, public, 129.133.1.226/255.255.255.0 (greentail.wesleyan.edu)   * eth2, public, 129.133.1.226/255.255.255.0 (greentail.wesleyan.edu)
-  * eth3, ipmi, do later?, (greentail-ipmi)+  * eth3, ipmi, 192.168.103.254/255.255.0.0 (greentail-ipmi, should go to better switch ProCurve 2910, do later)
   * ib0, ipoib, 10.10.103.254/255.255.0.0 (greentail-ib0)   * ib0, ipoib, 10.10.103.254/255.255.0.0 (greentail-ib0)
-  * ib1, split ipoib traffic? (might not have cables), 10.10.104.254/255.255.0.0 (greentail-ib1)+  * ib1, ipoib, 10.10.104.254/255.255.0.0 (greentail-ib1, configure, might not have cables!, split traffic across ports?)
  
   * Raid 1 mirrored disks (2x250gb)   * Raid 1 mirrored disks (2x250gb)
   * /home mount point for home directory volume ~ 10tb   * /home mount point for home directory volume ~ 10tb
-  * /home/apps mount point for software volume ~ 1tb+  * /home/apps mount point for software volume ~ 1tb (contains /home/apps/src)
   * /home/sanscratch mount point for sanscratch volume ~ 5 tb   * /home/sanscratch mount point for sanscratch volume ~ 5 tb
   * logical volume LOCALSCRATCH: mount at /localscratch ~ 100 gb (should match nodes at 160 gb, leave rest for OS)   * logical volume LOCALSCRATCH: mount at /localscratch ~ 100 gb (should match nodes at 160 gb, leave rest for OS)
-  * logical volume SOURCE: mount at /home/apps/src ~ 50 gb +  * logical volumes ROOT/VAR/BOOT/TMPdefaults 
-  * logical volumes ROOT/VAR/BOOT: defaults+ 
 +=====  StorageWorks MSA60  ===== 
 +[[http://h10010.www1.hp.com/wwpc/us/en/sm/WF25a/12169-304616-241493-241493-241493-4118559.html|HP Link]] (storage device) 
 + 
 +  * Dual power (one to UPS, one to utility, do later)  
 + 
 +  * Three volumes to start with:  
 +    * home (raid 6, design a backup path, do later), 10 tb 
 +    * apps (raid 6, design a backup path, do later), 1tb 
 +    * sanscratch (raid 1, no backup), 5 tb 
 + 
 +  * Systems Insight Manager (SIM) [[http://h18013.www1.hp.com/products/servers/management/hpsim/index.html?jumpid=go/hpsim|HP Link]] (Linux Install and Configure Guide, and User Guide) 
 +    * Do we need a windows box (virtual) to run the Central Management Server on? 
 +    * install, configure 
 +    * requires an oracle install? no, hpsmdb is installed with automatic installation (postgresql) 
 +    * linux deployment utilities, and management agents installation 
 +    * configure managed systems, automatic discovery 
 +    * configure automatic event handling 
 + 
 + 
 +===== SL2x170z G6 ===== 
 +[[http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01800572&prodTypeId=18964&prodSeriesId=489496|HP Link]] (compute nodes) 
 + 
 +    * node names hp000, increment by 1 
 +    * eth0, provision, 192.168.102.25(increment by 1)/255.255.0.0 (hp000-eth0, should go to better switch ProCurve 2910) 
 +    * eth1, data/private, 10.10.102.25(increment by 1)/255.255.0.0 (hp000-eth1, should go to ProCurve 2610) 
 +    * eth2, ipmi, 192.168.103.25(increment by 1)/255.255.0.0, (hp000-ipmi, should go to better switch ProCurve 2910, do later) 
 +    * ib0, ipoib, 10.10.103.25(increment by 1)/255.255.0.0 (hp000-ib0) 
 +    * ib1, ipoib, 10.10.104.25(increment by 1)/255.255.0.0 (hp000-ib1, configure, might not have cables!) 
 + 
 +    * /home mount point for home directory volume ~ 10tb 
 +    * /home/apps mount point for software volume ~ 1tb (contains /home/apps/src
 +    * /home/sanscratch mount point for sanscratch volume ~ 5 tb 
 +    * logical volume LOCALSCRATCH: mount at /localscratch 100 gb (60 gb left for OS) 
 +    * logical volumes ROOT/VAR/BOOT/TMP: defaults 
 + 
 + 
 +===== Misc ===== 
 + 
 +  * IPoIB 
 +    * configuration, fine tune 
 +    * monitor 
 + 
 +  * Cluster Management Utility (CMU) 
 +    * install, configure, monitor 
 +    * golden image capture, deploy (there will initially only be one image)
  
-===== Disk Array =====+  * Sun Grid Engine (SGE) 
 +    * install, configure 
 +    * there will only be one queue (hp12)
  
-  * Dual power (one to UPS, one to utility+  * KVM utility 
 +    * functionality
  
-===== Compute Nodes =====+  * Placement 
 +    * where in data center (do later), based on environmental works
  
  
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
cluster/89.txt · Last modified: 2010/11/22 19:05 by hmeij