cluster:23
Home
Towards Deployment
I think i will close this page. The one major outstanding issue standing in the way of declaring ourselves “in production” mode is a serious backup policy. I'm snapshotting via the NetApp filers but need Tivoli backups of home directories. So lets close this page. Once you see the “Backup Policies” page listed, we've deployed formally.
08/20/2007 Went to LSF training v6.2 – we should move to this “flagship” scheduler asap. IT will allow us to provide “on-demand” cluster resources for classes via pre-emptive scheduling and advanced reservations. Also learned how to write a resource that can be sued for scheduling, like aksing for >300GB space availability in /sanscratch. Needs to be implemented soon.
07/21/2007 YaMan. First E1000 error code on compute-1-13 which basically means failed hardware. Dell engineer on it's way for monday. Meanwhile matlab is fully installed and working both in distributed & parallel mode. Also installed the '2007b' trial version which will give us unlimited licenses till 9/15
07/02/07 It's been awhile. Roughly about 40+ user accounts, but it's pretty quiet. Amber/Intel compilation a nightmare with topspin libraries but we finally managed. Purchase of Matlab (distributed/parallel version with worker Engine) and Intel compilers (c++, fortran and cluster math library) is underway. Experimenting with MPI. Other open source software installed, consult the software page.
05/24/07 Early
bird butterfly user accounts have been created. Starting on software list.
fsck
results on massive but clean filesystem are surpringly good over fiber channel, see
link.
05/08/07 Messing around with multipath configurations hung the ionode. So OCS rebuild it and now i have to backtrace all the configurations and re-apply them. Hint: do not switch the fiber channel cables … it's really, really, not pleasant.
05/05/07 Switching to new filer and figuring out multipath LUNs.
Read about it.
04/17/07 Started changes in the configuration of Lava. Several queues set up. Started the
User Guides & Manuals page so i can collect the info i learn there. Oh, and the first 15K RPM disk failed in one of the MD1000. Contacted Dell support.
04/13/07 Guest accounts have been created and the Platform vendor successfully webVPN'ed in. Postfix has been configured and now hands off all email traffic to wesleyan's servers. Chnged the configuration of Lava, the scheduler and created queues. Rebuild the ROCKS distribution. Rack moutned the NetApp filer and disk trays.
03/30/07 Mounted sample LUNs from NetApp filer to ionode, configured NFS, and mounted filesystems 'sanscatch', 'users' and 'powerusers' on compute nodes. Stepped manually thorugh process of adding user accounts. Fired off HPLinpack Infiniband benchmark (no problem), am having some trouble with the Ethernet suite, we should see red all over
03/27/07
Public and Technical (on ITSdoku) documentation done.
sysreport
dumps made to /opt/netboot/cluster.
03/19/07 Dell arrives for final configuration step,
read about it.
03/16/07 2nd ethernet switch installed by Henk, trying to cable the “dell” way
03/14/07 Conference call with eXludus Technologies (technology to pre-stage data under compute nodes and leverage bandwidth as well as other resources) … here is their
Brochure
03/03/07 The Argus
artical about the cluster
02/27/07 Henk on vacation, back March 12th, not really progress is it ?
02/23/2007 HPCwire, in the section on
Cluster Computing, sports the Wesleyan Connection article.
Home
cluster/23.txt · Last modified: 2017/10/03 13:34 by hmeij07