User Tools

Site Tools


cluster:136

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:136 [2015/02/10 16:50]
hmeij [HomeDir & Storage Options]
cluster:136 [2020/07/28 13:21] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:​0|Back]]** **[[cluster:​0|Back]]**
 +
 +''/​home''​ is defunct but remains for compatibility. It has been moved from sharptail to whitetail. New home directories are at ''/​zfshomes''​. Although quotas are in place (starting at 1T for new accounts) users typically get what they need.  Static content should eventually be migrated to our Rstore platform.
 +
 + --- //​[[hmeij@wesleyan.edu|Henk]] 2020/07/28 13:18//
  
 ==== HomeDir & Storage Options ==== ==== HomeDir & Storage Options ====
Line 6: Line 10:
 The HPCC cluster'​s file server ''​sharptail.wesleyan.edu''​ serves out all home directories to all nodes at location /home. It is 10 TB in size and it currently takes the nightly process of backup 1-2 hours to churn through. ​ Making it larger would thus generate more traffic on /​home. ​ So we've, for now while it works for us, come up with this policy: The HPCC cluster'​s file server ''​sharptail.wesleyan.edu''​ serves out all home directories to all nodes at location /home. It is 10 TB in size and it currently takes the nightly process of backup 1-2 hours to churn through. ​ Making it larger would thus generate more traffic on /​home. ​ So we've, for now while it works for us, come up with this policy:
  
-  * All users are under quota which automatically ​get increased by 100 GB increments.+  * All users are under quota which automatically ​is increased by 100 GB increments.
   * When a user consumes 1024 GB (1 TB) the automatic increases stop.   * When a user consumes 1024 GB (1 TB) the automatic increases stop.
     * this home file system is twice a month backed up from sharptail'​s to greentail'​s disk array     * this home file system is twice a month backed up from sharptail'​s to greentail'​s disk array
-    * nightly snapshots (point in time backups) are done on sharptail disk array and stored there too+    * nightly snapshots (point in time backups) are done on sharptail'​s ​disk array and stored there too
  
-At this point users need to off load static content to other locations. ​ Contents like old analyses, results of published papers, etc. Users typically have two options at this time:+At this point users need to off load static content to other locations. ​ Contents like old analyses, results of published papers, etc. Users typically have one local option available:
  
-  * Keep contents out of /home and migrate it to /archives (7 TB)+  * Keep contents out of /home and migrate it to /archives (7 TB, accessible on all "​tail"​ nodes)
     * request a directory for you in this file system and move contents to it     * request a directory for you in this file system and move contents to it
     * this archive file system is twice a month backed up from sharptail'​s to greentail'​s disk array     * this archive file system is twice a month backed up from sharptail'​s to greentail'​s disk array
-  * Users with home directories of 500 GB in size should start considering moving data to /archives+  * Users with home directories of 500GB in size should start considering moving data to /archives
  
 Users whom are considered inactive have their home directories relocated to /​archives/​inactive Users whom are considered inactive have their home directories relocated to /​archives/​inactive
Line 22: Line 26:
   * these accounts are kept around until we do an account edit and purge (has never happened so far)   * these accounts are kept around until we do an account edit and purge (has never happened so far)
  
-Finally, if your storage needs cannot be supported by /​archives ​there is off-cluster storage ​available. Rstore is our latest storage solution for groups and labs.+The remote storage option, if your storage needs cannot be supported by /archivesis off-cluster storage. Rstore is our latest storage solution for groups and labs with such needs.
  
   * ask your lead faculty member if your lab/group has such an area or request one   * ask your lead faculty member if your lab/group has such an area or request one
   * then move your static content permanently off the HPCC cluster environment   * then move your static content permanently off the HPCC cluster environment
   * details can be found at [[cluster:​135|RSTORE FAQ]]   * details can be found at [[cluster:​135|RSTORE FAQ]]
 +
 +==== Moving Content ====
 +
 +Our file server is named ''​sharptail.wesleyan.edu''​ (or ''​sharptail''​ when on cluster) and it is a 4U integrated storage and server module with an 48TB of disk array. Moving content can severely crippled this server. **/home** is served out by this server to all nodes and if the server can not handle all read/write requests everything comes to a halt. So when moving content please monitor and also observe if others are currently doing something along this line. Here are some tips.
 +
 +
 +Do not use any type of copy tool with a GUI or cp/scp or s/ftp. Especially the GUI (drag&​drop) are Verboten! These tools are not smart enough and frequently generated blocked processes that halt everything. Use ''​rsync''​ in a linux/unix environment.
 +
 +**Check it out:**
 +
 +  * ''​ssh sharptail.wesleyan.edu''​
 +  * is the server busy (''​uptime''​ loads < 8 are ok) 
 +  * is there memory available (''​free -m''​ look at free values)
 +  * is anybody else using rsync (''​ps -efl | grep rsync''​)
 +  * is the server busy writing (''​iotop''​ look at the M/s disk writes(q to quit), values >100-200 M/s == busy!)
 +
 +Three scenarios are depicted below. When crossing the vertical boundaries you are not dealing with local content anymore, thus the content needs to flow over the network. ''​rsync''​ has many features, one of the important one is the use of a remote shell allowing an elegant way to cross these boundaries. ​
 +
 +<​code>​
 +
 +                        |         /​home ​        ​| ​   group share     ​| ​   some lab location ​    
 +some lab location ​      ​| ​                      ​| ​                   |
 +                  <​----------->​ sharptail <​----------->​ Rstore <​-----------> ​                               ​
 +some other college ​     |                       ​| ​                   |    ​
 +                        |         /​archives ​    ​| ​    lab share      |    some other college ​              
 +
 +</​code>​
 +
 +**Some feature examples**
 +
 +  * preserve permissions,​ do a checksum between source/​target files, observe what will happen
 +      * ''​rsync -vac --dry-run''​
 +  * delete files on destination not present on source (careful!)
 +      * ''​rsync --delete''​
 +  * throttle the rate of traffic generated, make your sysadmin happy, use <5000
 +      * ''​rsync --bwlimit=2500''​
 +  * do not look inside files
 +      * ''​rsync --whole-files''​
 +  * use a remote shell from host to host (crossing those vertical boundaries above)
 +      * ''​rsync ​ -vac /​home/​my/​stuff/ ​ user@somehost.wesleyan.edu:/​home/​my/​stuff/''​
 +
 +Note the use of trailing slashes, it means update everything inside source ''​stuff/''​ within target ''​stuff/''​. If you left the first trailing slash off the above command it means put source directory ''​stuff/''​ inside target directory ''​stuff/''​ meaning you'll end up with target ''/​home/​my/​stuff/​stuff''​. You've been warned. Use the dry run option if unsure what will happen.
 +
 +** Putting it all together **
 +
 +<​code>​
 +
 +# copy the dir stuff from lab or remote college to my home on HPCC in tmp area 
 +# (first log in to remote location)
 +
 +rsync -vac --bwlimit=2500 --whole-files /​home/​user/​stuff user@sharptail.wesleyan.edu:/​home/​user/​tmp/​
 +
 +# sync my HPCC dir stuff folder into /archives locally on sharptail, then clean up
 +# (first log in to sharptail) ​
 +
 +rsync -vac --bwlimit=2500 /​home/​user/​stuff/ ​ /​archives/​user/​stuff/​
 +rm -rf /​home/​user/​stuff/​*
 +
 +# generate a copy of content on Rstore disk array outside of HPCC but within wesleyan.edu
 +# (get paths and share names from faculty member, on sharptail do)
 +
 +rsync -vac --bwlimit=2500 /​home/​user/​stuff ​ user@rstoresrv0.wesleyan.edu:/​data/​2/​labcontent/​projects/​
 +
 +# you can also do this in reverse, log in to sharptail first
 +
 +rsync -vac --bwlimt=2500 user@rstoresrv0.wesleyan.edu:/​data/​2/​labcontent/​projects/​stuff ​ /​home/​user/ ​
 +
 +</​code>​
 +
  
 \\ \\
cluster/136.1423605023.txt.gz ยท Last modified: 2015/02/10 16:50 by hmeij