User Tools

Site Tools


cluster:92

This is an old revision of the document!



Home

a “bottom's up” page of tasks performed while inching towards deployment.

Update (07/02/2011)

Hi All,

Matlab and Stata (and queues) have been moved to new greentail cluster. My last “todo”, buying new Intel compilers for cluster greentail, is within sight. The usage of the new HP cluster has been, well, rather low. Not sure why. I am thus proposing a meeting for Q&As about our entire HPC environment detailed at https://dokuwiki.wesleyan.edu/doku.php?id=cluster:95

The following next steps we can also discuss at the meeting:

- virtual tape backups of the old home dirs on petaltail will be halted at some time - hence we will rely on disk2disk backups on greentail (new home dirs with weekly refresh from petaltail) - nodes may disappear on petaltail/sharptail as I experiment with ingestion into greentail

If you can all email me directly with potentially good meeting times, i'll try to maximize attendance.

-Henk

HP Towards Deployment

  • buy new intel compilers and cmkl?
  • job 100,000 mile stone (and 1,000,000 on swallowtail)
  • wait for monthly snapshots to start rotating before disabling 5 TB tivoli
  • 07/02/2011 Done, no changes for matlab, stata users.
  • 15/01/2011 Done: setup scripts to log usage, gather stats
  • 14/01/2011 Done: benchmark some applications, test mpi flavors
  • 10/01/2011 Done: run some test jobs via scheduler, some benchmarks
  • 07/01/2010 install Lava (no SGE installed, so lets do this first)
    • lava is up on greentail and n1
    • need to propagate golden image, done
  • 05/01/2011 rsync over rest of home directories over break (started 20dec10)
    • without the '–delete' and '–delete-exclude' flags
    • added '–stats' flag
    • added -the -k and -l flags for rsync when using /home/? where ? is a through z
    • 07/01/2011 almost there, /home to /snapshots is working
    • need to figure out mounts for user access at /snapshots/repository, done
    • set up weekply pulling from /oldhome to /home, in progress
  • 12/18/2010 set up rsnapshot
    • exercising hourly, daily, weekly, monthly
  • 12/15/2010 copy /share/apps across, see below
    • will have to rsync petaltail:/share/apps/src into /share/backups/petaltail (to pull into greentail:/share/apps/src)
  • 12/15/20101 connect filer3's home directories on /mnt
    • rsync -vac -k /oldhome/username /home (does the trick, note no trailing slashes)
    • i will rsync everything over on date X and redo on date X+Y, overwrite
    • anticipate a deadline for moving off netapp filer3 permanently (dell cluster reconfig dependent)
    • make sure understand the lack of TSM filesystem backups
  • 12/15/2010 CMU license expired, turns out to be a demo license.
    • chasing around hp support for a resolution.
  • 12/15/2010 n21: missing dimms or bad dimms, open up the blade enclosure. Fixed.
  • 12/09/10 CMU backup
    • single file daily backup cmu.conf to /usr/local/backups/cmu
  • 12/09/10 Backup databases (hpsmdb for SIM using postgresql, mysql not installed)
    • not much success, not really needed, can do a rediscover
  • 12/09/10 Clean up /root and archive the training session materials.
  • 12/08/10 Linpack burn in. Stressing the hardware. Done. Results can be found here: Linpack
  • 12/05/10 Document training session materials. Done, on itsdoku wiki.
  • 12/03/10 Full backup of local hard disk (/ and /boot). Done.
  • 12/03/10 David Holton arrives for training. Part of “resource for a week”. Replaced a disk that was already flagged as “predictive failure”. Biggest change in cluster configuration is how the MSA60 volumes were set up. We destroyed all that and rearranged three volumes across the scsi cables (that is vertical versus horizontal) and applied LVM on top.
  • 11/25/10 Decommissioned two BSS racks (cluster sharptail) which will be donated Wellesley University. That freed up six L6-30 circuits. With James, three of them were pulled to the area were the Flexible Storage array and Equallogic array are racked. The other three are dedicated to the HP cluster. Pulled one Enterprise UPS L6-30 near the HP cluster and turned it on. Three PDUs are lit up, but differ from documentation. Change the cabling a bit so that both head node power supplies, one side of the MSA60 power supplies, and all switches plus KVM are on the enterprise UPS. This shall not be interruptable, yea.
  • 11/15/10 Cluster arrives. A bit overdue (first ETA was 10/01/10).


Home

cluster/92.1297193407.txt.gz · Last modified: 2011/02/08 19:30 by hmeij