This shows you the differences between two versions of the page.
cluster:18 [2007/04/10 09:51] |
cluster:18 [2007/04/10 09:51] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | \\ | ||
+ | **[[cluster: | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ====== Thoughts on Cluster Network / Future Growth ====== | ||
+ | |||
+ | As i'm working my way through some of the Platform/ | ||
+ | |||
+ | < | ||
+ | It was interesting to note where Dell installs the 2 switches in their rack layout: | ||
+ | in Rack #1, slots U01 (Infiniband) & U02 (gigabit ethernet) ... for ease of cabling | ||
+ | in Rack #2, slots U01 & U02 ... are left empty, this provides for an upgrade path | ||
+ | And that got me thinking about design issues, sparking this pre-delivery configuration ramble. | ||
+ | </ | ||
+ | |||
+ | |||
+ | |{{: | ||
+ | |||
+ | == Summary Update (with thanks to wes faulty/ | ||
+ | |||
+ | //I arrived at a puzzle. Suddenly it appeared to me the head node had an HBA card (allows connections to the Infiniband switch). | ||
+ | \\ | ||
+ | \\ | ||
+ | So i looked at the design: | ||
+ | \\ | ||
+ | \\ | ||
+ | #1 To connect a head node, or ionode, to the infiniband swith is trying to exploit the computational power of either, | ||
+ | \\ | ||
+ | \\ | ||
+ | #2 IPoIB, we were warned about this, too experimental. | ||
+ | \\ | ||
+ | \\ | ||
+ | The HBA card appears to be hot swappable so i'm just going to leave it as is.// | ||
+ | \\ | ||
+ | \\ | ||
+ | --- // | ||
+ | |||
+ | ====== [in black] Configuration Details ====== | ||
+ | |||
+ | * HN = Head Node | ||
+ | \\ | ||
+ | * NIC1 129.133 | ||
+ | * NIC2 198.162 | ||
+ | * / (contains / | ||
+ | * / | ||
+ | * io node:/home | ||
+ | * io node:/ | ||
+ | * user logins permitted | ||
+ | * frequent backups | ||
+ | * firewall shield | ||
+ | \\ | ||
+ | The head node (front-end-0-0) is attached to 2 networks. | ||
+ | |||
+ | DHCP, http/https, 411 and other Platform/ | ||
+ | |||
+ | During the ROCKS installation, | ||
+ | |||
+ | Remote file systems, such as /home and /sanscratch are mounted from the io node via NFS. NIC2 is connected to the private network gigabit switch. | ||
+ | |||
+ | \\ | ||
+ | * ION = IO Node | ||
+ | \\ | ||
+ | * NIC1 198.122 | ||
+ | * NIC2 off | ||
+ | * / (contains only operating system) | ||
+ | * second hard disk idle | ||
+ | * LUN from SAN via fiber channel for /home | ||
+ | * LUN from SAN via fiber channel for /sanscratch | ||
+ | * user logins not permitted | ||
+ | * infrequent backups | ||
+ | \\ | ||
+ | The io node services the cluster two file systems via NFS. These file systems appear local to the io node via fiber channel (FC) from the SAN network attached storage devices (currently two NetApp clustered servers). | ||
+ | |||
+ | NIC1 provides the connectivity with the private network (198.162.xxx.xxx) and serves all NFS traffic to all compute nodes (4 heavy weight, 16 light weight, and 16 light weight on Infiniband; a total of 36 nodes) plus the NFS requirements of the head node. __This could potentially be a bottle neck.__ | ||
+ | |||
+ | If the backup plan materializes for the /home file systems, it seems logical that the io node would the involved. However, maybe it's feasible to rely on the SAN snapshot capabilities with a very thin policy (as in a single snapshot per day ...). Not sure. | ||
+ | |||
+ | \\ | ||
+ | * HWN = Heavy Weight Node | ||
+ | \\ | ||
+ | * NIC1 198.162 | ||
+ | * NIC2 off | ||
+ | * / (contains operating system on first hard disk) | ||
+ | * 2nd hard disk: not installed | ||
+ | * / | ||
+ | * MD1000 storage device with split backplanes (each node access to 7 36 Gb disks spinning at 15,000 RPM, Raid 0) | ||
+ | * io node:/home | ||
+ | * io node:/ | ||
+ | * head node:/ | ||
+ | * user logins not permitted | ||
+ | * no backup | ||
+ | |||
+ | The only distinction between " | ||
+ | |||
+ | If we mount the scratch space on / | ||
+ | |||
+ | /home and /sanscratch are mounted from the io node using NFS via NIC1. | ||
+ | |||
+ | \\ | ||
+ | * LWN = Light Weight Node | ||
+ | \\ | ||
+ | * NIC1 198.162 | ||
+ | * NIC2 off | ||
+ | * one group of 16 nodes with HCA1 adapter installed | ||
+ | * / (contains operating system on first hard disk) | ||
+ | * / | ||
+ | * io node:/home | ||
+ | * io node:/ | ||
+ | * head node:/ | ||
+ | * user logins not permitted | ||
+ | * no backup | ||
+ | |||
+ | There are two groups of light weight nodes each comprised of 16 nodes. | ||
+ | |||
+ | In both groups the /home and /sanscratch filesystems are mounted via NFS over the private network (NIC1). | ||
+ | |||
+ | |||
+ | ====== [in green] Separating Administrative and NFS traffic ====== | ||
+ | |||
+ | From the **Platform/ | ||
+ | |{{: | ||
+ | |||
+ | //ok then//, a 30% bandwidth improvement to each node is worth the investigation. Since it is __so common__, this is why Dell left those slots empty (i actually confirmed this with Dell). | ||
+ | |||
+ | If a second private network is added, lets say 10.3.xxx.xxx, | ||
+ | |||
+ | In this configuration then, all NFS related traffic could be forced through the 10.3 private network. | ||
+ | |||
+ | //so then//, why stop there? | ||
+ | |||
+ | If this design is implemented, | ||
+ | |||
+ | \\ | ||
+ | ====== [in red] Potential Expansion ====== | ||
+ | |||
+ | Scheming along, the question then presents itself: How many nodes could be added so that all port switches are in use? Answer = 10. | ||
+ | |||
+ | The 10 additional nodes, let's assume light weight nodes (drawn in red, see drawing), can all be connected to both gigabit ethernet switches. | ||
+ | |||
+ | Rack #1 has 8 available 1U slots, so after stashing in 7 new nodes, the nodes are connected to both switches located in that rack (that would leave one 1U slot available in the rack). | ||
+ | |||
+ | Rack #2, has 16 available 1U slots, so after stashing in 3 new nodes, the nodes are connected to both switches in that rack (leaving 13 available 1U slots in the rack). | ||
+ | |||
+ | Then, the 7 nodes from rack #1 need to be connected to the gigabit ethernet switch in rack #2, and the 3 new nodes in rack #2 need to be connected to the gigabit ethernet switch in rack #1. Hehe. | ||
+ | |||
+ | Presto, should be possible. Power supply wise this should be ok. That leaves a total of 14 available 1U slots ... more nodes, yea!, drawn in white 8-) | ||
+ | |||
+ | |||
+ | \\ | ||
+ | ====== Rack Layout ====== | ||
+ | |||
+ | Please note that i have modified the original drawings for clarity and also suppressed all the power distribution and supply connections. | ||
+ | |||
+ | The compute nodes are all Power Edge 1950 servers occupying 1U of rack space each. \\ | ||
+ | The head node and io node are Power Edge 2950 servers occupying 2U of rack space each.\\ | ||
+ | As a reminder, the [[cluster: | ||
+ | |||
+ | \\ | ||
+ | ^ Links to Dell's web site for detailed information^^ | ||
+ | |[[http:// | ||
+ | |||
+ | \\ | ||
+ | ^Rack #1^Rack #2^ | ||
+ | |{{: | ||
+ | | "3 tons of cooling required" | ||
+ | |||
+ | \\ | ||
+ | **[[cluster: |