This shows you the differences between two versions of the page.
cluster:37 [2007/05/15 13:13] |
cluster:37 [2007/05/15 13:13] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | \\ | ||
+ | **[[cluster: | ||
+ | |||
+ | ===== Cluster Steering Committee 05/09/2007 ===== | ||
+ | |||
+ | Present: James Taft, Jolee West, Henk Meij, Francis Starr, David Beveridge, Eric Aaron, Tsampikos Kottos, George Petersson | ||
+ | |||
+ | ==== ToDos ==== | ||
+ | |||
+ | * fix PE2950s (dell issued) | ||
+ | |||
+ | * < | ||
+ | |||
+ | * filesystem fsck tests (affirm 1T LUN size) | ||
+ | |||
+ | * < | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Next Steps ==== | ||
+ | |||
+ | * create accounts (all members of cluster_advisory_group); | ||
+ | |||
+ | * open for test & development (no home dir backups, few snapshots); approach seemed reasonable especially since we can snapshot more frequent initially when the filesystem is relatively unused. | ||
+ | |||
+ | * install software ... prioritization led to: amber, delphi, imsl. others suggested were fortran 90 open-source compiler (does it exist? appears so [[http:// | ||
+ | |||
+ | | group to proritize => | portland compilers, | ||
+ | |||
+ | * adjust queues, currently have (in order of descending priority); it was decide to bring a few general purpose queues up with little or no restrictions. | ||
+ | |||
+ | | name | description (lw=light weight, hw=heavy weight, i=infiniband)| | ||
+ | | => specialty queues || | ||
+ | | priority | urgent jobs, limited by users allowed, 8 hrs of cpu time, max cores=8, any lw node| | ||
+ | | checkpoint | jobs will be checkpointed, | ||
+ | | icheckpoint | same as above but ilw nodes| | ||
+ | | debug | hosts login1 & login2 (ethernet), max queued jobs=16, max jobs/ | ||
+ | | idebug | same as above but ilw nodes only (hosts ilogin & ilogin2)| | ||
+ | | => production queues || | ||
+ | | 16-lwnodes | for normal jobs, lw nodes, max cores=32| | ||
+ | | 16-ilwnodes | for normal jobs, ilw nodes, max cores=128| | ||
+ | | 04-hwnodes | for large memory jobs, hw nodes, max cores=32, fast / | ||
+ | | => default queue || | ||
+ | | idle | jobs run on any host lightly loaded, max cores=8, max jobs/ | ||
+ | |||
+ | other items | ||
+ | |||
+ | * get a quote for an LSF upgrade for all nodes | ||
+ | |||
+ | * initiate TSM tape backups while the filesystem is rather small by adding some tapes to empty slots. | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **[[cluster: | ||