The HPCC cluster's file server
sharptail.wesleyan.edu serves out all home directories to all nodes at location /home. It is 10 TB in size and it currently takes the nightly process of backup 1-2 hours to churn through. Making it larger would thus generate more traffic on /home. So we've, for now while it works for us, come up with this policy:
At this point users need to off load static content to other locations. Contents like old analyses, results of published papers, etc. Users typically have one local option available:
Users whom are considered inactive have their home directories relocated to /archives/inactive
The remote storage option, if your storage needs cannot be supported by /archives, is off-cluster storage. Rstore is our latest storage solution for groups and labs with such needs.
Our file server is named
sharptail when on cluster) and it is a 4U integrated storage and server module with an 48TB of disk array. Moving content can severely crippled this server. /home is served out by this server to all nodes and if the server can not handle all read/write requests everything comes to a halt. So when moving content please monitor and also observe if others are currently doing something along this line. Here are some tips.
Do not use any type of copy tool with a GUI or cp/scp or s/ftp. Especially the GUI (drag&drop) are Verboten! These tools are not smart enough and frequently generated blocked processes that halt everything. Use
rsync in a linux/unix environment.
Check it out:
uptimeloads < 8 are ok)
free -mlook at free values)
ps -efl | grep rsync)
iotoplook at the M/s disk writes(q to quit), values >100-200 M/s == busy!)
Three scenarios are depicted below. When crossing the vertical boundaries you are not dealing with local content anymore, thus the content needs to flow over the network.
rsync has many features, one of the important one is the use of a remote shell allowing an elegant way to cross these boundaries.
| /home | group share | some lab location some lab location | | | <-----------> sharptail <-----------> Rstore <-----------> some other college | | | | /archives | lab share | some other college
Some feature examples
rsync -vac –dry-run
rsync -vac /home/my/stuff/ firstname.lastname@example.org:/home/my/stuff/
Note the use of trailing slashes, it means update everything inside source
stuff/ within target
stuff/. If you left the first trailing slash off the above command it means put source directory
stuff/ inside target directory
stuff/ meaning you'll end up with target
/home/my/stuff/stuff. You've been warned. Use the dry run option if unsure what will happen.
Putting it all together
# copy the dir stuff from lab or remote college to my home on HPCC in tmp area # (first log in to remote location) rsync -vac --bwlimit=2500 --whole-files /home/user/stuff email@example.com:/home/user/tmp/ # sync my HPCC dir stuff folder into /archives locally on sharptail, then clean up # (first log in to sharptail) rsync -vac --bwlimit=2500 /home/user/stuff/ /archives/user/stuff/ rm -rf /home/user/stuff/* # generate a copy of content on Rstore disk array outside of HPCC but within wesleyan.edu # (get paths and share names from faculty member, on sharptail do) rsync -vac --bwlimit=2500 /home/user/stuff firstname.lastname@example.org:/data/2/labcontent/projects/ # you can also do this in reverse, log in to sharptail first rsync -vac --bwlimt=2500 email@example.com:/data/2/labcontent/projects/stuff /home/user/