User Tools

Site Tools


This is an old revision of the document!


HomeDir & Storage Options

The HPCC cluster's file server serves out all home directories to all nodes at location /home. It is 10 TB in size and it currently takes the nightly process of backup 1-2 hours to churn through. Making it larger would thus generate more traffic on /home. So we've, for now while it works for us, come up with this policy:

  • All users are under quota which automatically is increased by 100 GB increments.
  • When a user consumes 1024 GB (1 TB) the automatic increases stop.
    • this home file system is twice a month backed up from sharptail's to greentail's disk array
    • nightly snapshots (point in time backups) are done on sharptail's disk array and stored there too

At this point users need to off load static content to other locations. Contents like old analyses, results of published papers, etc. Users typically have one local option available:

  • Keep contents out of /home and migrate it to /archives (7 TB, accessible on all “tail” nodes)
    • request a directory for you in this file system and move contents to it
    • this archive file system is twice a month backed up from sharptail's to greentail's disk array
  • Users with home directories of 500+ GB in size should start considering moving data to /archives

Users whom are considered inactive have their home directories relocated to /archives/inactive

  • these accounts are kept around until we do an account edit and purge (has never happened so far)

The remote storage option, if your storage needs cannot be supported by /archives, is off-cluster storage. Rstore is our latest storage solution for groups and labs with such needs.

  • ask your lead faculty member if your lab/group has such an area or request one
  • then move your static content permanently off the HPCC cluster environment
  • details can be found at RSTORE FAQ

How do I ..?

Well, move stuff around? Try to avoid programs such as cp, sftp/scp for large content migrations. The better bet is rsync. man rsync for the manual page.

With rsync you can:

  • preserve permissions, do a checksum between source/destination files, observe what will happen
    • rsync -vac –dry-run
  • delete files on destination not present on source
    • rsync –delete
  • throttle the rate of traffic generated, make your sysadmin happy, use
    • rsync –bwlimit=10000
  • and much more …

So to put it all together, for example move my directory in my home directory named stuff elsewhere

rsync –vac –delete –bwlimit=10000 –dry-run /home/username/stuff rstore0:/data/2/somelabgroup/mydirecotory/

Is output ok? Then run again without the –dry-run omitted.

Note the lack of source trailing slash but present destination trailing slash; meaning put source inside destination location. If both had a trailing slash it would mean; update source and target at these locations. Beware. –delete may bite.

Once contents have been migrated rm -rf /home/username/stuff


cluster/136.1423607226.txt.gz · Last modified: 2015/02/10 17:27 by hmeij