DokuWiki

This is an old revision of the document!

Cluster: sharptail

From the Blue Sky Studios donations of hardware we have created a high performance compute cluster. Named sharptail. A sharptail saltmarsh sparrow is a secretive and solitary bird, seldomly seen most often only heard by its splendid song, which inhabits a very narrow coastal habitat stretching from Maine to Florida.

Because of limitations in the hardware, cluster sharptail can only be reached by first establishing an SSH session with petaltail or swallowtail, and then an SSH session to sharptail. So the cluster sits entirely behind those login nodes.

The cluster consists entirely of blades, 13 per enclosure. Currently 5 enclosures are powered. Each blade contains dual AMD Opteron model 250 CPUs (single core) running at 2.4 Ghz with a memory footprint of 12 gb per blade.

Like petaltail and swallowtail, the provision network runs over 192.168.1.xxx while all nfs traffic runs over the “other” network being 10.3.1.xxx. All blades have but a single 80 gb hard disk. After imaging roughly 50 gb is left over, which is presented as /localscratch on each blade.

The cluster is created, maintained and managed with Project Kusu, an open source software stack derived from Platform's OCS5 software stack.

The entire cluster is on utility power with no power backup for any blade, or the installer node sharptail. Your home directory is the same as on petaltail and swallowtail and is ofcourse being backed up.

The scheduler is Lava, that is, a recent version of LSF but not the latest. All regular commands for submitting jobs work the same way. The only difference is the advanced queue settings are missing like fairshare, pre-emption, backfill etc.

The operating system is CentOS, version 5.3 at this time. This is much like Redhat but not completely. All software compiled in /share/apps has been compiled under RHEL 5.1 and hence may not work in CentOS 5.3. We will recompiled software when requested and post it in /share/apps/centos53 … all commercial software will only run on petaltail and swallowtail.

All nodes together add 128 job slots to our HPC environment. All on gigabit ethernet switches. To run programs in this environment submit them to the queue “bss12” (stands for blue sky studios, 12 gb ram). We might have a “bss24” queue in the future with 24 gb ram blades. Node names follow the convention bss000-bss063.

As such, we will start to implement a soft policy that the Infiniband switch is dedicated to jobs that invoke MPI parallel programs. That is queue “imw” on petaltail and swallowtail.

Back

DokuWiki

User Tools

Site Tools

Cluster: sharptail

Page Tools