Cluster Steering Committee 05/09/2007

Present: James Taft, Jolee West, Henk Meij, Francis Starr, David Beveridge, Eric Aaron, Tsampikos Kottos, George Petersson


  • fix PE2950s (dell issued)
  • dm-multipath failover (fiber channel) done! 05/14/07
  • filesystem fsck tests (affirm 1T LUN size)
  • rebuild a node done that! hoosed the ionode 05/08/07

Next Steps

  • create accounts (all members of cluster_advisory_group); by end week of 05/18/07
  • open for test & development (no home dir backups, few snapshots); approach seemed reasonable especially since we can snapshot more frequent initially when the filesystem is relatively unused.
  • install software … prioritization led to: amber, delphi, imsl. others suggested were fortran 90 open-source compiler (does it exist? appears so G95 ) and xmgrace & ddd.
group to proritize ⇒ portland compilers, Matlab, charm, amber namd, gromax, gaussian + linda, R, Stata
  • adjust queues, currently have (in order of descending priority); it was decide to bring a few general purpose queues up with little or no restrictions.
name description (lw=light weight, hw=heavy weight, i=infiniband)
⇒ specialty queues
priority urgent jobs, limited by users allowed, 8 hrs of cpu time, max cores=8, any lw node
checkpoint jobs will be checkpointed, max queued jobs=16, max jobs/user=2, cpu time=10 hrs, lw nodes. jobs are rerunnable
icheckpoint same as above but ilw nodes
debug hosts login1 & login2 (ethernet), max queued jobs=16, max jobs/user=8, cpu time=1 hr. scheduled with relatively high priority. lw nodes
idebug same as above but ilw nodes only (hosts ilogin & ilogin2)
⇒ production queues
16-lwnodes for normal jobs, lw nodes, max cores=32
16-ilwnodes for normal jobs, ilw nodes, max cores=128
04-hwnodes for large memory jobs, hw nodes, max cores=32, fast /localscratch
⇒ default queue
idle jobs run on any host lightly loaded, max cores=8, max jobs/user=1, cpu time 10 hrs.

other items

  • get a quote for an LSF upgrade for all nodes
  • initiate TSM tape backups while the filesystem is rather small by adding some tapes to empty slots.


