This is an old revision of the document!
From the Blue Sky Studios donations of hardware we have created a high performance compute cluster. Named sharptail. A sharptail saltmarsh sparrow is a secretive and solitary bird, seldomly seen most often only heard by its splendid song, which inhabits a very narrow coastal habitat stretching from Maine to Florida.
Because of limitations in the hardware, cluster sharptail can only be reached by first establishing an SSH session with petaltail or swallowtail, and then an SSH session to sharptail. So the cluster sits entirely behind those login nodes.
The cluster consists entirely of blades, 13 per enclosure. Currently 5 enclosures are powered. Each blade contains dual AMD Opteron model 250 CPUs (single core) running at 2.4 Ghz with a memory footprint of 12 gb per blade.
Like petaltail and swallowtail, the provision network runs over 192.168.1.xxx while all nfs traffic runs over the “other” network being 10.3.1.xxx. All blades have but a single 80 gb hard disk. After imaging roughly 50 gb is left over, which is presented as /localscratch on each blade.
The cluster is created, maintained and managed with Project Kusu, an open source software stack derived from Platform's OCS5 software stack.
The entire cluster is on utility power with no power backup for any blade, or the installer node sharptail. Your home directory is the same as on petaltail and swallowtail and is ofcourse being backed up.
The operating system is CentOS, version 5.3 at this time. This is much like Redhat but not completely. All software compiled in /share/apps has been compiled under RHEL 5.1 and hence may not work in CentOS 5.3. We will recompiled software when requested and post it in /share/apps/centos53 … all commercial software will only run on petaltail and swallowtail.
The scheduler is Lava, that is, a recent version of LSF but not the latest. All regular commands for submitting jobs work the same way. The only difference is the advanced queue settings are missing like fairshare, pre-emption, backfill etc.
All nodes together add 128 job slots to our HPC environment. All on gigabit ethernet switches. To run programs in this environment submit them to the queue “bss12” (stands for blue sky studios, 12 gb ram). We might have a “bss24” queue in the near future with 24 gb ram blades. Node names follow the convention bss000-bss063.
As such, we will start to implement a soft policy that the Infiniband switch is dedicated to jobs that invoke MPI parallel programs. That is queue “imw” on petaltail and swallowtail.
There are still some minor configurations that need to be implemented but you are invited to put cluster sharptail to work.
This page will be updated with solutions to problems found or questions asked.
All your tools will remain on petaltail.wesleyan.edu
Are the LSF and Lava scheduler working togther?
No. They are two physically and logically individual cluster.
How can i determine if my program or the program i am using will work on sharptail?
When programs compile on a certain host, they link themselves against system and sometimes custom libraries. If they are missing on another host, the program will not run. To check, use 'ldd', if none are missing you are good to go.
[hmeij@sharptail ~]$ ldd /share/apps/python/2.6.1/bin/python libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003679e00000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003679600000) libutil.so.1 => /lib64/libutil.so.1 (0x0000003687000000) libm.so.6 => /lib64/libm.so.6 (0x0000003679a00000) libc.so.6 => /lib64/libc.so.6 (0x0000003679200000) /lib64/ld-linux-x86-64.so.2 (0x0000003678e00000)
If one or more libraries are missing, you could run this command on petaltail or swallowtail, observe were the libraries are located, and if they exist on sharptail, add that path to LD_LIBRARY_PATH. Otherwise the programs need to be recompiled if they are supported on CentOS 5.3 x86_64.