User Tools

Site Tools


cluster:136

Warning: Undefined array key -1 in /usr/share/dokuwiki/inc/html.php on line 1458

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:136 [2015/02/10 16:50]
hmeij [HomeDir & Storage Options]
cluster:136 [2020/07/28 13:21] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:0|Back]]** **[[cluster:0|Back]]**
 +
 +''/home'' is defunct but remains for compatibility. It has been moved from sharptail to whitetail. New home directories are at ''/zfshomes''. Although quotas are in place (starting at 1T for new accounts) users typically get what they need.  Static content should eventually be migrated to our Rstore platform.
 +
 + --- //[[hmeij@wesleyan.edu|Henk]] 2020/07/28 13:18//
  
 ==== HomeDir & Storage Options ==== ==== HomeDir & Storage Options ====
Line 6: Line 10:
 The HPCC cluster's file server ''sharptail.wesleyan.edu'' serves out all home directories to all nodes at location /home. It is 10 TB in size and it currently takes the nightly process of backup 1-2 hours to churn through.  Making it larger would thus generate more traffic on /home.  So we've, for now while it works for us, come up with this policy: The HPCC cluster's file server ''sharptail.wesleyan.edu'' serves out all home directories to all nodes at location /home. It is 10 TB in size and it currently takes the nightly process of backup 1-2 hours to churn through.  Making it larger would thus generate more traffic on /home.  So we've, for now while it works for us, come up with this policy:
  
-  * All users are under quota which automatically get increased by 100 GB increments.+  * All users are under quota which automatically is increased by 100 GB increments.
   * When a user consumes 1024 GB (1 TB) the automatic increases stop.   * When a user consumes 1024 GB (1 TB) the automatic increases stop.
     * this home file system is twice a month backed up from sharptail's to greentail's disk array     * this home file system is twice a month backed up from sharptail's to greentail's disk array
-    * nightly snapshots (point in time backups) are done on sharptail disk array and stored there too+    * nightly snapshots (point in time backups) are done on sharptail'disk array and stored there too
  
-At this point users need to off load static content to other locations.  Contents like old analyses, results of published papers, etc. Users typically have two options at this time:+At this point users need to off load static content to other locations.  Contents like old analyses, results of published papers, etc. Users typically have one local option available:
  
-  * Keep contents out of /home and migrate it to /archives (7 TB)+  * Keep contents out of /home and migrate it to /archives (7 TB, accessible on all "tail" nodes)
     * request a directory for you in this file system and move contents to it     * request a directory for you in this file system and move contents to it
     * this archive file system is twice a month backed up from sharptail's to greentail's disk array     * this archive file system is twice a month backed up from sharptail's to greentail's disk array
-  * Users with home directories of 500 GB in size should start considering moving data to /archives+  * Users with home directories of 500GB in size should start considering moving data to /archives
  
 Users whom are considered inactive have their home directories relocated to /archives/inactive Users whom are considered inactive have their home directories relocated to /archives/inactive
Line 22: Line 26:
   * these accounts are kept around until we do an account edit and purge (has never happened so far)   * these accounts are kept around until we do an account edit and purge (has never happened so far)
  
-Finally, if your storage needs cannot be supported by /archives there is off-cluster storage available. Rstore is our latest storage solution for groups and labs.+The remote storage option, if your storage needs cannot be supported by /archivesis off-cluster storage. Rstore is our latest storage solution for groups and labs with such needs.
  
   * ask your lead faculty member if your lab/group has such an area or request one   * ask your lead faculty member if your lab/group has such an area or request one
   * then move your static content permanently off the HPCC cluster environment   * then move your static content permanently off the HPCC cluster environment
   * details can be found at [[cluster:135|RSTORE FAQ]]   * details can be found at [[cluster:135|RSTORE FAQ]]
 +
 +==== Moving Content ====
 +
 +Our file server is named ''sharptail.wesleyan.edu'' (or ''sharptail'' when on cluster) and it is a 4U integrated storage and server module with an 48TB of disk array. Moving content can severely crippled this server. **/home** is served out by this server to all nodes and if the server can not handle all read/write requests everything comes to a halt. So when moving content please monitor and also observe if others are currently doing something along this line. Here are some tips.
 +
 +
 +Do not use any type of copy tool with a GUI or cp/scp or s/ftp. Especially the GUI (drag&drop) are Verboten! These tools are not smart enough and frequently generated blocked processes that halt everything. Use ''rsync'' in a linux/unix environment.
 +
 +**Check it out:**
 +
 +  * ''ssh sharptail.wesleyan.edu''
 +  * is the server busy (''uptime'' loads < 8 are ok) 
 +  * is there memory available (''free -m'' look at free values)
 +  * is anybody else using rsync (''ps -efl | grep rsync'')
 +  * is the server busy writing (''iotop'' look at the M/s disk writes(q to quit), values >100-200 M/s == busy!)
 +
 +Three scenarios are depicted below. When crossing the vertical boundaries you are not dealing with local content anymore, thus the content needs to flow over the network. ''rsync'' has many features, one of the important one is the use of a remote shell allowing an elegant way to cross these boundaries. 
 +
 +<code>
 +
 +                        |         /home            group share        some lab location     
 +some lab location                                                |
 +                  <-----------> sharptail <-----------> Rstore <----------->                                
 +some other college      |                                          |    
 +                        |         /archives         lab share      |    some other college               
 +
 +</code>
 +
 +**Some feature examples**
 +
 +  * preserve permissions, do a checksum between source/target files, observe what will happen
 +      * ''rsync -vac --dry-run''
 +  * delete files on destination not present on source (careful!)
 +      * ''rsync --delete''
 +  * throttle the rate of traffic generated, make your sysadmin happy, use <5000
 +      * ''rsync --bwlimit=2500''
 +  * do not look inside files
 +      * ''rsync --whole-files''
 +  * use a remote shell from host to host (crossing those vertical boundaries above)
 +      * ''rsync  -vac /home/my/stuff/  user@somehost.wesleyan.edu:/home/my/stuff/''
 +
 +Note the use of trailing slashes, it means update everything inside source ''stuff/'' within target ''stuff/''. If you left the first trailing slash off the above command it means put source directory ''stuff/'' inside target directory ''stuff/'' meaning you'll end up with target ''/home/my/stuff/stuff''. You've been warned. Use the dry run option if unsure what will happen.
 +
 +** Putting it all together **
 +
 +<code>
 +
 +# copy the dir stuff from lab or remote college to my home on HPCC in tmp area 
 +# (first log in to remote location)
 +
 +rsync -vac --bwlimit=2500 --whole-files /home/user/stuff user@sharptail.wesleyan.edu:/home/user/tmp/
 +
 +# sync my HPCC dir stuff folder into /archives locally on sharptail, then clean up
 +# (first log in to sharptail) 
 +
 +rsync -vac --bwlimit=2500 /home/user/stuff/  /archives/user/stuff/
 +rm -rf /home/user/stuff/*
 +
 +# generate a copy of content on Rstore disk array outside of HPCC but within wesleyan.edu
 +# (get paths and share names from faculty member, on sharptail do)
 +
 +rsync -vac --bwlimit=2500 /home/user/stuff  user@rstoresrv0.wesleyan.edu:/data/2/labcontent/projects/
 +
 +# you can also do this in reverse, log in to sharptail first
 +
 +rsync -vac --bwlimt=2500 user@rstoresrv0.wesleyan.edu:/data/2/labcontent/projects/stuff  /home/user/ 
 +
 +</code>
 +
  
 \\ \\
cluster/136.1423605023.txt.gz ยท Last modified: 2015/02/10 16:50 by hmeij