User Tools

Site Tools


cluster:17

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cluster:17 [2007/01/10 09:43] (current)
Line 1: Line 1:
 +\\
 +**[[cluster:​0|Home]]**
  
 +====== Webcast Demo of Platform/​OCS by Platform Computing ======
 +
 +William DeSalvo, from Platform Computing, did a webcast presentation about Platform/​OCS ... the administrative software layer of our cluster design. ​ Several documents were obtained detailing administrative aspects of the Platform/​OCS software stack (see below). ​
 +
 +===== Several interesting tidbits surfaced ===== 
 +
 + * Lava (read PBS) and LSF do recognize quad cores. ​ Normally a "​jobslot"​ is defined inside the Lava or LSF configuration,​ one per processor. However, by increasing the number of jobslots the scheduler can be made aware of the total number of cores available for scheduling, if desired.
 +
 + * "​esub"​ (aka "job submission filter"​) is available for Lava.   This script will allow you to alter job parameters, including queue etc.  What this implies is that a "​routing queue" could be setup. ​ A routing queue allows users to schedule jobs and defined needed parameters but let the scheduler figure out which queue to submit it to. It "​routs"​ the job to best available queue either by looking at the best fit or by evaluating logic such as potentially backfilling,​ prioritization based on user status etc.  It is a way to extend the Lava functionality if needed. ​ LSF ofcourse has all the functions build in.  A routing queue could be useful to let users schedule jobs and not have to worry about which queue to specify.
 +
 + * Preemption and Reservation are not available in Lava (but are in LSF).  Preemption allow for suspension of low priority jobs when high priority jobs are submitted. ​ Reservation allows for the scheduling of jobs which would be unlikely to be executed given the queue configurations. ​ For example, the 32 light weight nodes will probably be split in 16 on the gigabit ethernet switch and 16 on the infiniband switch. ​ A job that requires all 32 light weight nodes, given other job scheduling, would unlikely be scheduled. ​ Reservation makes it possible to defined a period for execution of such jobs at some time in the future, blocking other jobs/queues for being scheduled.
 +
 + * Clumon, a monitoring tool like Ganglia, is build into Platform/​OCS. ​ For example, view the 1,280 node **Tungsten Cluster** at NCSA [[http://​clumon-w.ncsa.uiuc.edu/​|External Link]]. ​ This is really an interesting monitor and gives a pretty good overview of what the cluster is doing. ​ __Check it out__.
 +
 +
 + ===== Training is available and two courses are of interest ===== 
 +
 +  * Platform Open Cluster Stack (OCS)  Basic Configuration & Administration
 +
 +  * Platform Open Cluster Stack (OCS) Advanced Configuration & Administration
 +
 +No dates are available for 2007, but plans are for an early April session. "In our experience, we have found it best when we had a few universities interested in training, and pulled them together for a formal onsite training. ​ We are planning a course of this nature in early April" [Jenny Yam, Training Coordinator,​ Platform Computing]
 +
 +
 + ===== Documentation =====  ​
 +
 +| | |
 +^Document^Comments^
 +| | |
 +|{{:​cluster:​install_notes.pdf|:​cluster:​install_notes.pdf}}| |
 +| | |
 +|{{:​cluster:​lava_admin_6.1.pdf|:​cluster:​lava_admin_6.1.pdf}}| |
 +| | |
 +|{{:​cluster:​lava_using_6.1.pdf|:​cluster:​lava_using_6.1.pdf}}| |
 +| | |
 +|{{:​cluster:​lava_vs_lsf_.doc|:​cluster:​lava_vs_lsf_.doc}}| |
 +| | |
 +|{{:​cluster:​platform_ocs_4.1.1-2.0_dell_user_guide.pdf|:​cluster:​platform_ocs_4.1.1-2.0_dell_user_guide.pdf}}| |
 +| | |
 +|{{:​cluster:​platform_ocs_centos_4.1.1-2.0_roll_readme.pdf|:​cluster:​platform_ocs_centos_4.1.1-2.0_roll_readme.pdf}}| |
 +| | |
 +|{{:​cluster:​platform_ocs_centos_4.1.1-2_readme.pdf|:​cluster:​platform_ocs_centos_4.1.1-2_readme.pdf}}| |
 +| | |
 +^ ^ ^
 +| | |
 +
 +\\
 +**[[cluster:​0|Home]]**
cluster/17.txt ยท Last modified: 2007/01/10 09:43 (external edit)