User Tools

Site Tools


cluster:23

This is an old revision of the document!



Home

Towards Deployment

  • I think i will close this page. The one major outstanding issue standing in the way of declaring ourselves “in production” mode is a serious backup policy. I'm snapshotting via the NetApp filers but need Tivoli backups of home directories. So lets close this page. Once you see the “Backup Policies” page listed, we've deployed formally.
  • 08/26/2007 First 5th floor data center power outage with cluster on board. Should be marvelous fun. Cluster going down the saturday night before it. Jobs will be “requeued”.
  • 08/22/2007 First LSF admin duties. Wrote script for lsb.events.n and lsb.acct.n rotation and archiving (leaving file is n=1). Also am using bacct to list scheduler statistics on swallowtail's main web page. lsb.events rotates for every 1000 jobs, lsb.acct rotates every 30 days.
  • 08/20/2007 Went to LSF training v6.2 – we should move to this “flagship” scheduler asap. IT will allow us to provide “on-demand” cluster resources for classes via pre-emptive scheduling and advanced reservations. Also learned how to write a resource that can be sued for scheduling, like aksing for >300GB space availability in /sanscratch. Needs to be implemented soon.
  • 08/07/2007 Testing the IB/GigE switches for performance. Complicated MPI. Need to get going to Tivoli tape backup so we can call the cluster “in production” mode.
  • 07/21/2007 YaMan. First E1000 error code on compute-1-13 which basically means failed hardware. Dell engineer on it's way for monday. Meanwhile matlab is fully installed and working both in distributed & parallel mode. Also installed the '2007b' trial version which will give us unlimited licenses till 9/15 :-P
  • 07/16/2007 Intel compilers installed, working on MAtlab configuration issues and connectivity with our scheduler.
  • 07/02/07 It's been awhile. Roughly about 40+ user accounts, but it's pretty quiet. Amber/Intel compilation a nightmare with topspin libraries but we finally managed. Purchase of Matlab (distributed/parallel version with worker Engine) and Intel compilers (c++, fortran and cluster math library) is underway. Experimenting with MPI. Other open source software installed, consult the software page.
  • 05/24/07 Early bird butterfly user accounts have been created. Starting on software list. fsck results on massive but clean filesystem are surpringly good over fiber channel, see link.
  • 05/08/07 Messing around with multipath configurations hung the ionode. So OCS rebuild it and now i have to backtrace all the configurations and re-apply them. Hint: do not switch the fiber channel cables … it's really, really, not pleasant.
  • 05/05/07 Switching to new filer and figuring out multipath LUNs. Read about it.
  • 04/25/07 Waiting for FAS disk space to come online.
  • 04/17/07 Started changes in the configuration of Lava. Several queues set up. Started the User Guides & Manuals page so i can collect the info i learn there. Oh, and the first 15K RPM disk failed in one of the MD1000. Contacted Dell support.
  • 04/13/07 Guest accounts have been created and the Platform vendor successfully webVPN'ed in. Postfix has been configured and now hands off all email traffic to wesleyan's servers. Chnged the configuration of Lava, the scheduler and created queues. Rebuild the ROCKS distribution. Rack moutned the NetApp filer and disk trays.
  • 04/03/07 Still struggling with these ethernet nodes but am at least able to get them running HPLinpack.
  • 03/30/07 Mounted sample LUNs from NetApp filer to ionode, configured NFS, and mounted filesystems 'sanscatch', 'users' and 'powerusers' on compute nodes. Stepped manually thorugh process of adding user accounts. Fired off HPLinpack Infiniband benchmark (no problem), am having some trouble with the Ethernet suite, we should see red all over :-?

:cluster:hplburn_during.gif

  • 03/28/07 Installed Tivoli backup client on head node and fired off first backup to nighthaw & remotehawk. Waiting for this to finish.
  • 03/27/07 Public and Technical (on ITSdoku) documentation done. sysreport dumps made to /opt/netboot/cluster.
  • 03/19/07 Dell arrives for final configuration step, read about it.
  • 03/16/07 2nd ethernet switch installed by Henk, trying to cable the “dell” way 8-)
  • 03/14/07 Conference call with eXludus Technologies (technology to pre-stage data under compute nodes and leverage bandwidth as well as other resources) … here is their Brochure
  • 03/03/07 The Argus artical about the cluster
  • 02/27/07 Henk on vacation, back March 12th, not really progress is it ? :-/
  • 02/23/2007 HPCwire, in the section on Cluster Computing, sports the Wesleyan Connection article.
  • 02/13/2007 Picture taking with Olivia from the Wesleyan Connection
  • 02/12/2007 VLAN 1 extended into rack
  • 02/09/2007 electric hookup and rack positioning
  • 02/01/2007 hardware arrives
  • 01/26/2007 design conference with Dell
  • christmas 2006, final quote & PO cut
  • fall 2006 installation of 12,000 BTU cooler



Home

cluster/23.1188401506.txt.gz · Last modified: 2017/10/03 09:34 (external edit)