(no idea what the acronym, if any, stands for)
ROCKS | Platform Rocks |
ROCKS is an open-source software stack that enables the consistent delivery of scale-out application clusters | Platform Open Cluster Stack (OCS) is a pre-integrated, vendor certified, software stack that enables the consistent delivery of scale-out application clusters using ROCKS |
User Guide provides a good overview of what ROCKS does | Platform, the company, offers 24×7 support for its “rocks implementation”. In addition, certain vendors are certified, meaning the hardware has been tested. Dell is certified, Sun is not |
Support is via Email Discussion List | Annual Cluster Care subscription: * 24×7 support * Regular maintenance * Periodic upgrades * Patches * Access to resources |
the Rocks Wiki | US $150 per node, per year |
Introduction to Clusters, 200 slides, good intro | |
Introduction to Rocks, 200 slides, often too detailed | |
A cluster is typically comprosed of a “front end” node, perhaps accompagnied with an “io” node. Then there may be numerous “light weight” nodes and “heavy weight” nodes. The difference between the light&heavy would be CPU speed of the chips, how many cores/cpu, and total memory footprint. All nodes are densely packed in a rack and connected via switches (like gigabit ethernet). Special hardware, like Infiniband switches provide for high performance, low latency connectivity. Here is a typical cluster layout Image. So what does ROCKS do?
First the “front end” node is configured using ROCKS. During this period, several “Rolls” are provided; the Kernel Roll, the Base Roll, the HPC Roll and the Server Pack and Webserver Rolls. Also the ROCKS operating system Rolls are provided. This can be substituted with any of the following but
must include all your operating system cdroms:
CentOS, Redhat Enterprise Linux AS4, and
Scientific Linux
Next steps involve configuring the front end node (including making the front end aware of any switches on the cluster). All this information is stored in MySQL databases alogn with the work node information collected later. This allows the RCOKS software to generated kickstart file, /etc/hosts files etc for each node. Once that is done, you insert the Kernel Roll into the cdrom of the first work node. This node, with the default name of computer-0-0, will via DHCP, contact the front end node which registers the node. The work node will then request a kickstart file and an operating system will be installed on the work node. These steps are repeated for all nodes.
On the front end node, a suite of configuration files are under the control of a program called “411”; files like /etc/passwd. So once a user has been added on the front end node via useradd, 411 then progagates these changes to the work nodes on a schedule (or can be forced to do so on demand). Default ROCKS setup does not allow users to log into work nodes but the UID/GID and accounts must be made available to the back end nodes.
front end node:/export/apps is the filesystem area that is shared underneath each work node. It is globally available as work node:/share/apps. This is the area where files not under 411 control, or operating system RPM packages, are located. That typically is the area to install shared applications, custom scripts, global datasets, etc … although it's preferable to install applications inside the operating system area via RPM packages.
That's the basics of it
ROCKS basically manages a “distribution” of one or more operating systems. So, for example:
shoot-node compute-0-0 … this command instructs a work node to reinstall the operating system from scratch wiping out all local data (which should take about 10 mins)
cluster-fork “unix command” … is a utility that takes any unix command and executes it on each node, or a subset derived with –sql=“sql command” from the database, and collects all the results on the front end node.
monitoring your cluster. all information describing the nodes is save in mysql databases accessible and configurable via an apache server.
link
viewing status graphs of how work nodes are performing using Ganglia.
link source link
obtaining “cluster top” output to view in detail individual process information across work nodes.
link
Next: Job Scheduling & Launching … the “PBS” Roll
Next: Message Passing … ??? (perhaps in the “HPC” Roll)
There is also a Roll for SGE, the Sun Grid Engine