User Tools

Site Tools


cluster:149

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:149 [2016/08/29 08:28]
hmeij07 [http://www.netapp.com]
cluster:149 [2016/12/06 15:13] (current)
hmeij07 [Supermicro]
Line 6: Line 6:
  
  
-==== The problem ==== +==== The Storage Problem ==== 
  
 In a commodity HPC setup deploying plain NFS, bottle necks can develop.  Then the compute nodes hang and a cold reboot of the entire HPCC is needed. NFS clients on a compute node may contact NFS daemons on our file server sharptail and ask for say a file. The NFS daemon assigned the task then locates the content via metadata (location, inodes, access, etc) on the local disk array.  The NFS daemon  collects the content and hands it off to the NFS client on compute node. So the data passes thru the entire NFS layer. In a commodity HPC setup deploying plain NFS, bottle necks can develop.  Then the compute nodes hang and a cold reboot of the entire HPCC is needed. NFS clients on a compute node may contact NFS daemons on our file server sharptail and ask for say a file. The NFS daemon assigned the task then locates the content via metadata (location, inodes, access, etc) on the local disk array.  The NFS daemon  collects the content and hands it off to the NFS client on compute node. So the data passes thru the entire NFS layer.
Line 65: Line 65:
  
 That leaves 2 of 4 SFP+ ports that we can step down from 40G/10G Ethernet via two cables X6558-R6 connecting SFP+ to SFP+ compatible ports. Meaning, hopefully we can go from FAS2554 to our Netgear GS724TS or GS752TS 1G Ethernet SFP Link/ACT ports. That would hook up 192.168.x.x and 10.10.x.x to the FAS2554. We need, and have, 4 of these ports on each switch available; once the connection is made /home can be pNFS mounted across. That leaves 2 of 4 SFP+ ports that we can step down from 40G/10G Ethernet via two cables X6558-R6 connecting SFP+ to SFP+ compatible ports. Meaning, hopefully we can go from FAS2554 to our Netgear GS724TS or GS752TS 1G Ethernet SFP Link/ACT ports. That would hook up 192.168.x.x and 10.10.x.x to the FAS2554. We need, and have, 4 of these ports on each switch available; once the connection is made /home can be pNFS mounted across.
 +
 +Suggestion:  The X6558-R6 comes up as a SAS cables.    Ask if the Cisco Twin Ax cables would work with the Netgear?  I suggest ordering the optics on both the NetApp and Netgear. (Note to self: I do not understand this).
  
 Then ports e0a/e0b, green RJ45 ports to the right, connect to our cores switches (public and private) to move content from and to the FAS2554 (to the research labs for example). Then we do it again for the second controller. Then ports e0a/e0b, green RJ45 ports to the right, connect to our cores switches (public and private) to move content from and to the FAS2554 (to the research labs for example). Then we do it again for the second controller.
Line 78: Line 80:
 Then configure //second// controller.  My hope is that if the config is correct, each controller can now "see" each path. Then configure //second// controller.  My hope is that if the config is correct, each controller can now "see" each path.
  
-Can we bond hpcfiler01-eth3.wesleyan.edu and hpcfiler02-eth3.wesleyan.edu together to their core switches? (same question for eth4's wesleyan.local)+Q: Can we bond hpcfiler01-eth3.wesleyan.edu and hpcfiler02-eth3.wesleyan.edu together to their core switches? (same question for eth4's wesleyan.local) 
 + 
 +A:  No you cannot bond across controllers.  But with Clustered ONTAP, every interface are setup like Active/Passive because if the hosting port for the LIF failed, it will fail over to appropriate port on the other controller based on the failover-group. 
 + 
 + 
  
 Awaiting word from engineers if I got all this right. Awaiting word from engineers if I got all this right.
Line 90: Line 96:
 It's worth noting that 5 of these integrated storage servers fits the price tag of a single Netapp FAS2554 (the 51T version). So, you could buy 5 and split out /home into home1 thru home5. 200T, everybody can get as much disk space as needed. Distribute your heavy users across the 5. Mount everything up via IPoIB and round robin snapshot, as in, server home2 snapshots home1, etc. It's worth noting that 5 of these integrated storage servers fits the price tag of a single Netapp FAS2554 (the 51T version). So, you could buy 5 and split out /home into home1 thru home5. 200T, everybody can get as much disk space as needed. Distribute your heavy users across the 5. Mount everything up via IPoIB and round robin snapshot, as in, server home2 snapshots home1, etc.
  
-Elegant, simple, and you can start smaller and scale up.+Elegant, simple, and you can start smaller and scale up. We have room for 2 on the QDR Mellanox switch (and 2 up to 5 on the DDR Voltaire switch). Buying another QDR Mellanox adds $7K for an 18 port switch. IPoIB would be desired if we stay with Supermicro. 
 + 
 +What's even more desired is to start our own parallel file system with  [[http://www.beegfs.com/|BeeGFS]]  
 + 
 +**Short term plan** 
 + 
 +  * Grab the 32x2T flexstorage hard drives and insert into cottontail's empty disk array 
 +      * Makes for a 60T raw raid 6 storage place (2 hot spares) 
 +      * move the sharptail /snapshots to it (remove that traffic from file server) 
 +  * Dedicate greentail's disk array to /sanscratch 
 +      * Remove /home_backup 10T 
 +      * Extend /sanscratch form 27T to 37T 
 +  * Dedicate sharptail's disk array to /home 
 +      * Keep old 5T /sanscratch as backup, idle 
 +      * Remove 15T /snapshots 
 +      * Extend /home for 10T to 25T 
 +      * Keep 7T /archives until those users graduate, move to Rstore  
 + 
 +**Long term plan** 
 +  * Start a BeeGFS storage cluster 
 +      * cottontail as MS (management server)  
 +      * sharptail as AdMon (monitor server) and proof of concept storage OSS 
 +          * pilot storage on  idle /sanscratch/beegfs/   
 +          * also a folder on cottonttail:/snapshots/beegfs/ 
 +      * n38-n45 (8) as MDS (metadata servers, 15K local disk, no raid) 
 +      * Buy 2x 2U Supermicro for OSS (object storage servers for a total of 80T usable, raid 6, $12.5K) 
 +      * Serve up BeeGFS file system using IPoIB 
 +      * Move /home to it  
 +      * Backup to older disk arrays 
 +      * Expand as necessary
  
 ===== Loose Ends ===== ===== Loose Ends =====
Line 98: Line 133:
   * backup Openlava scheduler for cottontail.wesleyan.edu   * backup Openlava scheduler for cottontail.wesleyan.edu
   * backup replacement for greentail.wesleyan.edu (in case it fails)   * backup replacement for greentail.wesleyan.edu (in case it fails)
 +
 +Bought.
 + --- //[[hmeij@wesleyan.edu|Henk]] 2016/10/27 14:49//
  
 Warewulf golden image it as if it is greentail. Warewulf golden image it as if it is greentail.
cluster/149.1472473710.txt.gz · Last modified: 2016/08/29 08:28 by hmeij07