User Tools

Site Tools


cluster:93

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cluster:93 [2011/01/09 20:56]
hmeij
cluster:93 [2011/01/10 21:41]
hmeij
Line 1: Line 1:
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
 +
 +|{{:cluster:greentail.jpg|}}|
 +|  greentail  |
 +
  
 ====== Greentail ====== ====== Greentail ======
Line 16: Line 20:
 ===== Design ===== ===== Design =====
  
-The purchase of the HP hardware followed a fierce bidding round in which certain design aspects had to met.+The purchase of the HP hardware followed a fierce bidding round in which certain design aspects had to be met.
  
   * We continually run out of disk space for our home directories.  So the new cluster had to have a large disk array on board.   * We continually run out of disk space for our home directories.  So the new cluster had to have a large disk array on board.
Line 85: Line 89:
  
 ===== Rsnapshot ===== ===== Rsnapshot =====
 +
 +[[http://rsnapshot.org|Rsnapshot]] can be used to perform disk to disk backup of file systems using linux tools such as hard and soft links and rsync.  It will replace our practice of backing up to the virtual tape library. Since I had to disable the functionality of keeping modified files for 30 days (one file version only) because of the mass of files in /home (18 million by last count) we actually gain functionality using rsnapshot.  Rsnaphot will take daily, weekly and monthly point in time backups. We will keep backups for the last 6 days, the last 4 weeks and the last 3 months.
 +
 +Rsnaphot content of all the new home directory content is made available to you at /snapshot/repository/? where ? is a single letter from a to z.  This file system is read only.  Users can retrieve deleted data by simply copying the data lost back into their new home directories.
 +
 +Within the snapshot repository you will find directories:
 +
 +  * daily.0 (yesterday), daily.1 (day before yesterday) etc ... daily backups are taken mon-sat at 11 pm
 +  * weekly.0 (last week), weekly.1 (week before last week) etc ... weekly backups are taken sunday at 10:30 pm
 +  * monthly.0 (last month), monthly.1 (month before last month) etc ... monthly backups are taken on the first day of each month at 10:00 pm
 +
 +/home and /snapshot are different logical volumes using a different set of disks to protect against loss of data.  In addition, both use RAID 6 (double parity) for another layer of protection.  However, it is one disk array comprised of 4 disk shelves directly attached to greentail.  A catastrophic failure implies the potential of data loss.  I therefore encourage you to archive data elsewhere for permanent storage.
 +
 +===== Sanscratch =====
 +
 +Previously there were two scratch areas available to your programs: /localscratch which is roughly 50 GB on each node's local hard disk and /sanscratch a shared scratch area available to all nodes.  Sanscratch allows you to monitor your jobs progress by looking in /sanscratch/jobpid. It was also much larger (1 TB).
 +
 +However, since our fantastic crash of June 2008 ([[cluster:67|The catastrophic crash of June 08]] /snapshot was simply a directory inside /home and thus compete for disk space.
 +
 +On greentail, /sanscratch will be a separate logical volume of 5 TB using a different disk set.  SO i urge those that have very large files to stage their files in /sanscratch when running their jobs for best performance.  The scheduler will always create (and delete!) two directories for you.  The JOBPID of your job is used to create /localscratch/jobpid and /sanscratch/jobpid.
 +
 +===== MPI =====
 +
 +For those of you running MPI or MPI enabled applications, you will need to make some changes to your scripts.  The ''wrapper'' program to use with greentail's Lava scheduler is the same as for cluster sharptail. It can be found here:  /share/apps/bin/lava.openmpi.mpirun.   If other flavors are desired, you can inform me or look look at the example scripts lava.//mpi_flavor//.mpi[run|exec].
 +
 +Sometime ago I wrote some code to detect if a node is infiniband enabled or not, and based on the result, add command line arguments to the mpirun invocation.  If you use that code, you will need to change:  the path to obtain the port status (/usr/bin/ibv_devinfo) and in the block specify the interface change eth1 to ib0.
  
 ===== ... ===== ===== ... =====
 +
 +|{{:cluster:swallowtail.jpg|}}|{{:cluster:petaltail.jpg|}}|{{:cluster:sharptail.jpg|}}|
 +|  swallowtail  |  petaltail  |  sharptail  |
  
  
cluster/93.txt ยท Last modified: 2011/01/11 20:55 by hmeij