User Tools

Site Tools


cluster:37

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cluster:37 [2007/05/15 13:13] (current)
Line 1: Line 1:
 +\\
 +**[[cluster:​0|Home]]**
 +
 +===== Cluster Steering Committee 05/09/2007 =====
 +
 +Present: James Taft, Jolee West, Henk Meij, Francis Starr, David Beveridge, Eric Aaron, Tsampikos Kottos, George Petersson
 +
 +==== ToDos ====
 +
 +  * fix PE2950s (dell issued)
 +
 +  * <​del>​dm-multipath failover (fiber channel)</​del>​ done! 05/14/07
 +
 +  * filesystem fsck tests (affirm 1T LUN size)
 +
 +  * <​del>​rebuild a node</​del>​ done that! hoosed the ionode 05/08/07
 +
 +
 +
 +
 +
 +==== Next Steps ====
 +
 +  * create accounts (all members of cluster_advisory_group);​ by end week of 05/18/07
 +
 +  * open for test & development (no home dir backups, few snapshots); approach seemed reasonable especially since we can snapshot more frequent initially when the filesystem is relatively unused.
 +
 +  * install software ... prioritization led to: amber, delphi, imsl.  others suggested were fortran 90 open-source compiler (does it exist? appears so [[http://​www.g95.org|G95]] ) and xmgrace & ddd.
 +
 +| group to proritize => | portland compilers, ​ Matlab, charm, amber namd, gromax, gaussian + linda, R, Stata |
 +
 +  * adjust queues, currently have (in order of descending priority); it was decide to bring a few general purpose queues up with little or no restrictions.
 +
 +| name | description (lw=light weight, hw=heavy weight, i=infiniband)|
 +| => specialty queues ||
 +| priority | urgent jobs, limited by users allowed, 8 hrs of cpu time, max cores=8, any lw node|
 +| checkpoint | jobs will be checkpointed,​ max queued jobs=16, max jobs/​user=2,​ cpu time=10 hrs, lw nodes. jobs are rerunnable|
 +| icheckpoint | same as above but ilw nodes|
 +| debug | hosts login1 & login2 (ethernet), max queued jobs=16, max jobs/​user=8,​ cpu time=1 hr. scheduled with relatively high priority. lw nodes|
 +| idebug | same as above but ilw nodes only (hosts ilogin & ilogin2)|
 +| => production queues ||
 +| 16-lwnodes | for normal jobs, lw nodes, max cores=32|
 +| 16-ilwnodes | for normal jobs, ilw nodes, max cores=128|
 +| 04-hwnodes | for large memory jobs, hw nodes, max cores=32, fast /​localscratch|
 +| => default queue ||
 +| idle | jobs run on any host lightly loaded, max cores=8, max jobs/​user=1,​ cpu time 10 hrs.|
 +
 +other items
 +
 +  * get a quote for an LSF upgrade for all nodes
 +
 +  * initiate TSM tape backups while the filesystem is rather small by adding some tapes to empty slots.
 +
 +
 +\\
 +**[[cluster:​0|Home]]**
  
cluster/37.txt ยท Last modified: 2007/05/15 13:13 (external edit)