cluster:15
Scali/Manage
Like Platform/ROCKS (see link), Scali/Manage is a software suite of tools to manage clusters. It appears very, very versatile. Lots of stuff you can do but what attracted my interests in my brief perusals were:
- heterogeous clusters (as in, manage the other clsuters on campus …)
- “golden” image capture and deployment (you can also “roll-back” to previous versions!)
- simultaneously deploys RPM installations (so you can perform entire disk image updates with the “images” or incrementally with RPM packages)
- parallel ssh & file copy support
- Change Management … this is a biggie, for example, if you were to add a node: all nodes would need updating, this becomes automatic with change management, it'll auto detect what needs updating on other nodes
- Fault Handling and Root Cause Analysis … also a biggie, know when something breaks before it happens
- Scali/MAnage also handles other servers, server farms, grids and blade racks (so for example, rintintin's image could have been captured and deployed elsewhere, or rolled back after upgrading if unsuccessful)
- java/eclipse based gui and web based client
- it also supports PBS Pro, and MPI libraries and MPI/HA … that is high availability for HA (reasoning goes like … if jobs run 30 days and a single node fails and MPI is not HA then the entire job is aborted. So HA provides a pathway for atttempting to finish job while hardware underneath gets replaced).
- And lots more.
cluster/15.txt · Last modified: by 127.0.0.1
