|Zebra Swallowtail from Enchanted Learning|
Dell engineer Amol Choukekar arrives to do the final configuration of the cluster. First we set up two consoles; one for walking by the compute nodes and one permanently connected to head node.
Following that Amol embarks on undoing my handy work with the ethernet cables connected to the Dell switch. I had run the cables down the other side of the rack but this is also were the power cables are concentrated. To avoid any problems, Amol would rather have them on the other side. At this time we also switch the cabling; the Dell switch cables need to be in NIC1 and the Cisco switch cables need to be in NIC2.
After the cabling exercise, we connect the main power supplies of the UPS units. To our dismay, 2/3 of the L6-30 connectors appear to be dead; emergency call placed to the electricians. So for now we connect to the live connectors, charge the UPS batteries and plug the switches in.
Amol connects his laptop to the switch administrative ports and performs the switch configurations. Somewhat beyond me but certain features are altered such as spanning tree is disabled, PortFast is disabled, Multi-casting is disabled … Also, the switches are given appropriate IPs for their location on the network and proper settings for the gateway and netmasks are set. Then the switch new configuration is saved. If the switch has a self-test routine, it is invoked.
End of day, no electricians yet.
Dell engineer Raghav Sood joins us to shadow/help Amol. Electricians find a short in the main breaker and fix it. All power cables are now connected.
We power up all the hardware to check for power problems. None found, and none of the hardware sports an operating sytem. All hardware is powered off and the Platform/OCS DVD is inserted in the head node.
Upon boot we configure our cluster with the information requested, and lord behold, it even asks us for latitude and longitude! Oh well, we leave the settings and let it copy the information from DVD to front-end hard disk over lunch. Cluster is probably geographically located in Texas.
After lunch, we fire up each compute node individually. The nodes are instructed to PXEboot and send their MAC address to the DHCP server on the head node.
insert-ethers registers the node in the database, generates the appropriate files, and tells the node which kickstart file to use. The node then proceed to download the linux operating system from the head node and we move on to the next node. And the next node, and the next node, and the next node, and the next … you get the picture.
During this process some effort is spend to customize the IPs. For example we want the switches to reside at 192.168.1.0 and our first compute node at 192.168.1.1 for ease. So compute-1-1 node is in rack 1, slot 1 (bottom). This is somewhat tedious because of a bug discovered in the
–incr 1 parameter of
insert-ethers that throws an extra digit into the assigned IP (like compute-1-2 node becomes 192.168.1.12). Arghhh. Amol resorts to manually feeding
insert-ethers the appropriate IP on the command line and off we are.
The secondary hard disks are brought up, formatted and mounted as /localscratch.
The ionode server is also brought up under the control of the head node with
insert-ethers. Meanwhile, i've strung fiber across the data center and connected it to our NetApp server
At this time, Amol changes the configuration of the interfaces on the head node. During Platform/OCS install NIC1 has to be on the 192.168.1 private network and NIC2 on the 129.133 public network connection. The public interface is moved to NIC3 (this interface card actually has 2 slots which turns out handy later – thursday that is). The NFS private network 10.3 is now configured for NIC2 and connects the head node to the Cisco Force 10 switch.
Things start to get flaky. The database is reporting odd information when queried. It is decided to leave things as is (it's late already) and place a call to support at Platform Computing.
Ouch. The database reports being in a 'read-only' state. Amol attempts to restore a database dump from CVS that automatically happens each night but no go. No useful information was obtained from Platfrom Computing, so decision time. It's decided to redo the install, and let Platform/OCS pick the IPs it wants to assign. Oh goody, Henk gets to drive.
During the re-install, Amol reconfigures the IPs and gateways of the switches and Raghav/Henk bring up the nodes by PXEboot. The entire re-install is done by noon, wow, that was impressive. Amol concocts up a clever combination of
add-extra-nic and in a single command brings up all secondary interfaces in the database and on the compute nodes. The entire cluster needs rebooting which is done over lunch.
After lunch, the MD1000 disk arrays are brought up. Hardware RAID is nice! This takes 5 minutes and we're done. During formatting Amol is using
parted a partitioning manipulation program by GNU i'm not familiar with. He explains that on his previous site, a 2 TB filesystem failed to finish using fdisk but parted had no problem with it.
parted from now on.
Ok, it appears we're almost done. The setup of IPMI (IP Management Interface) is throwing access problems. Here we are using another private network 192.168.2 for this traffic. So now the head node's 4th NIC port is used. Amol sets up HPLinpack to run across the Infiniband 16 compute nodes. The HPL benchmark (High Performance Linpack) is a benchmark suite of programs that solves a dense system of linear equations. Amol sets this up intending to run the nodes at high load (8-10) for 10 hours or so. Heat and more heat pours out of the racks.
We move to my office because the HPLinpack setup for the ethernet compute nodes is failing. Platform Computing support is expert and gets us up and running. By this time the Inifiniband run is almost done to the amazement of Amol. But the test had taxed the UPS which kicked 3 non-Infiniband nodes in the same rack off the grid. We redistributed the power connections and fire off the second HPLinpack test.
⇒ Jokingly we all agreed Fire in the Hole and ran for cover and late-late diner.
Amol is in early.
Yeah! IPMI is also fixed by enabling it in the BIOS. Amol shows me how he changes the BIOS remotely, and with
cluster-fork it's literally one command to change the BIOS on all machines.
Note: we should rerun these tests and do them simultaneously before we deploy.
Data center is hot though, one cooling unit is throwing an alarm (82F). Later that afternoon James and I redistribute some floor tiles to get more cold air to the cluster and hopefully more warm air to one cooling unit that appears to not be taxed (sitting at 69F). Perhaps we should move the cluster.
Amol, Raghav and I finish up paperwork and a final walk by all the tools for knowledge transfer and any areas we might have missed. Amol makes sure i understand the difference between a
halt command given by
cluster-fork and a
power off command given by
ipmitools. Ah, crucial.
Oh joy! Documentation day.
I also start my first administration tasks and restrict http/https and ssh access to the domain wesleyan.edu.
Hot cooling unit is at 79F and cool cooling unit is at 71F.
Here is the front page to our cluster … Swallowtail Cluster
… accessible from wesleyan.edu only, here is what the banner looks like.
It was a totally enjoyable experience. But not always easy. Here is how i responded to the persistent Dell email question “How am i doing?”.
To the managers of the folks involved …
How am I doing? Email my manager … this question is included at the bottom of all dell originating emails.
Well, i'd like to answer that question.
Wesleyan University recently bought a computing cluster and it ended up being Dell hardware. I would like to personally take this opportunity to really thank all involved for the tremendous amount of effort being put forth by Dell personnel.
That involved Carolyn Arredondo and Tony Walker. Carolyn was extremely helpful in getting us in contact with experts going over the design issues and coordinating the final configuration step. I felt very taken care of. No uncertainty about what was going to happen and every question answered or researched by Tony. Even suggestions made on possible looming problems.
Amol Choukekar and Raghav Sood comprised the on-site team and i can't commend them enough for a quality installation. Every potential problem addressed and every design preference implemented. They also never got tired of my questions and were extremely approachable and interactive. Amol showed great interest in my persistent staring at the screen to understand whatever he was doing. He also explained everything in great technical detail without any hesitation. I really enjoyed that! And in my 8-9 years at Wesleyan i've never had a team of engineers or sales folks that actually took me up on my offer of using the athletics facilities here. But Raghav did for a squash session over lunch, and that was just great. Work & fun. Fun & work.
I'm very confident with what i've learned and totally enjoyed the experience. Well done. Thanks so much.