\\
**[[cluster:0|Home]]**
===== Cluster Steering Committee 05/09/2007 =====
Present: James Taft, Jolee West, Henk Meij, Francis Starr, David Beveridge, Eric Aaron, Tsampikos Kottos, George Petersson
==== ToDos ====
* fix PE2950s (dell issued)
* dm-multipath failover (fiber channel) done! 05/14/07
* filesystem fsck tests (affirm 1T LUN size)
* rebuild a node done that! hoosed the ionode 05/08/07
==== Next Steps ====
* create accounts (all members of cluster_advisory_group); by end week of 05/18/07
* open for test & development (no home dir backups, few snapshots); approach seemed reasonable especially since we can snapshot more frequent initially when the filesystem is relatively unused.
* install software ... prioritization led to: amber, delphi, imsl. others suggested were fortran 90 open-source compiler (does it exist? appears so [[http://www.g95.org|G95]] ) and xmgrace & ddd.
| group to proritize => | portland compilers, Matlab, charm, amber namd, gromax, gaussian + linda, R, Stata |
* adjust queues, currently have (in order of descending priority); it was decide to bring a few general purpose queues up with little or no restrictions.
| name | description (lw=light weight, hw=heavy weight, i=infiniband)|
| => specialty queues ||
| priority | urgent jobs, limited by users allowed, 8 hrs of cpu time, max cores=8, any lw node|
| checkpoint | jobs will be checkpointed, max queued jobs=16, max jobs/user=2, cpu time=10 hrs, lw nodes. jobs are rerunnable|
| icheckpoint | same as above but ilw nodes|
| debug | hosts login1 & login2 (ethernet), max queued jobs=16, max jobs/user=8, cpu time=1 hr. scheduled with relatively high priority. lw nodes|
| idebug | same as above but ilw nodes only (hosts ilogin & ilogin2)|
| => production queues ||
| 16-lwnodes | for normal jobs, lw nodes, max cores=32|
| 16-ilwnodes | for normal jobs, ilw nodes, max cores=128|
| 04-hwnodes | for large memory jobs, hw nodes, max cores=32, fast /localscratch|
| => default queue ||
| idle | jobs run on any host lightly loaded, max cores=8, max jobs/user=1, cpu time 10 hrs.|
other items
* get a quote for an LSF upgrade for all nodes
* initiate TSM tape backups while the filesystem is rather small by adding some tapes to empty slots.
\\
**[[cluster:0|Home]]**