User Tools

Site Tools


cluster:136

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cluster:136 [2015/02/10 17:03]
hmeij [HomeDir & Storage Options]
cluster:136 [2020/07/28 13:21] (current)
hmeij07
Line 1: Line 1:
 \\ \\
 **[[cluster:​0|Back]]** **[[cluster:​0|Back]]**
 +
 +''/​home''​ is defunct but remains for compatibility. It has been moved from sharptail to whitetail. New home directories are at ''/​zfshomes''​. Although quotas are in place (starting at 1T for new accounts) users typically get what they need.  Static content should eventually be migrated to our Rstore platform.
 +
 + --- //​[[hmeij@wesleyan.edu|Henk]] 2020/07/28 13:18//
  
 ==== HomeDir & Storage Options ==== ==== HomeDir & Storage Options ====
Line 11: Line 15:
     * nightly snapshots (point in time backups) are done on sharptail'​s disk array and stored there too     * nightly snapshots (point in time backups) are done on sharptail'​s disk array and stored there too
  
-At this point users need to off load static content to other locations. ​ Contents like old analyses, results of published papers, etc. Users typically one local option available:+At this point users need to off load static content to other locations. ​ Contents like old analyses, results of published papers, etc. Users typically ​have one local option available:
  
   * Keep contents out of /home and migrate it to /archives (7 TB, accessible on all "​tail"​ nodes)   * Keep contents out of /home and migrate it to /archives (7 TB, accessible on all "​tail"​ nodes)
Line 22: Line 26:
   * these accounts are kept around until we do an account edit and purge (has never happened so far)   * these accounts are kept around until we do an account edit and purge (has never happened so far)
  
-The remote option, if your storage needs cannot be supported by /archives, is off-cluster storage. Rstore is our latest storage solution for groups and labs which such needs.+The remote ​storage ​option, if your storage needs cannot be supported by /archives, is off-cluster storage. Rstore is our latest storage solution for groups and labs with such needs.
  
   * ask your lead faculty member if your lab/group has such an area or request one   * ask your lead faculty member if your lab/group has such an area or request one
Line 28: Line 32:
   * details can be found at [[cluster:​135|RSTORE FAQ]]   * details can be found at [[cluster:​135|RSTORE FAQ]]
  
-==== Who do I ..? ====+==== Moving Content ​====
  
-Well, move stuff around? ​ Try to avoid programs such as cp, sftp/scp for large contents migrations. ​ The better bet is ''​rsync''​.  ​''​man rsync'' ​for the manual page.+Our file server ​is named ''​sharptail.wesleyan.edu'' ​(or ''​sharptail'' ​when on cluster) and it is a 4U integrated storage and server module with an 48TB of disk array. Moving content can severely crippled this server. **/home** is served out by this server to all nodes and if the server can not handle all read/write requests everything comes to a halt. So when moving content please monitor and also observe if others are currently doing something along this line. Here are some tips.
  
-With rsync you can: 
  
-  ​* preserve permissions,​ do a checksum between source/destination ​files, observe what will happen+Do not use any type of copy tool with a GUI or cp/scp or s/ftp. Especially the GUI (drag&​drop) are Verboten! These tools are not smart enough and frequently generated blocked processes that halt everything. Use ''​rsync''​ in a linux/unix environment. 
 + 
 +**Check it out:** 
 + 
 +  * ''​ssh sharptail.wesleyan.edu''​ 
 +  * is the server busy (''​uptime''​ loads < 8 are ok)  
 +  * is there memory available (''​free -m''​ look at free values) 
 +  * is anybody else using rsync (''​ps -efl | grep rsync''​) 
 +  * is the server busy writing (''​iotop''​ look at the M/s disk writes(q to quit), values >100-200 M/s == busy!) 
 + 
 +Three scenarios are depicted below. When crossing the vertical boundaries you are not dealing with local content anymore, thus the content needs to flow over the network. ''​rsync''​ has many features, one of the important one is the use of a remote shell allowing an elegant way to cross these boundaries.  
 + 
 +<​code>​ 
 + 
 +                        |         /​home ​        ​| ​   group share     ​| ​   some lab location ​     
 +some lab location ​      ​| ​                      ​| ​                   | 
 +                  <​----------->​ sharptail <​----------->​ Rstore <​-----------> ​                                
 +some other college ​     |                       ​| ​                   |     
 +                        |         /​archives ​    ​| ​    lab share      |    some other college ​               
 + 
 +</​code>​ 
 + 
 +**Some feature examples** 
 + 
 +  ​* preserve permissions,​ do a checksum between source/target ​files, observe what will happen
       * ''​rsync -vac --dry-run''​       * ''​rsync -vac --dry-run''​
-  * delete files on destination not present on source+  * delete files on destination not present on source ​(careful!)
       * ''​rsync --delete''​       * ''​rsync --delete''​
-  * throttle the rate of trafic ​generated, make your sysadmins ​happy, use +  * throttle the rate of traffic ​generated, make your sysadmin ​happy, use <5000 
-      * ''​rsync ​''​--bwlimit=10000''​ +      * ''​rsync --bwlimit=2500''​ 
-  * and much more ...+  * do not look inside files 
 +      * ''​rsync --whole-files''​ 
 +  * use a remote shell from host to host (crossing those vertical boundaries above) 
 +      * ''​rsync ​ -vac /​home/​my/​stuff/ ​ user@somehost.wesleyan.edu:/​home/​my/​stuff/''​ 
 + 
 +Note the use of trailing slashes, it means update everything inside source ''​stuff/''​ within target ''​stuff/''​If you left the first trailing slash off the above command it means put source directory ''​stuff/''​ inside target directory ''​stuff/''​ meaning you'll end up with target ''/​home/​my/​stuff/​stuff''​. You've been warned. Use the dry run option if unsure what will happen. 
 + 
 +** Putting it all together ** 
 + 
 +<​code>​ 
 + 
 +# copy the dir stuff from lab or remote college to my home on HPCC in tmp area  
 +# (first log in to remote location) 
 + 
 +rsync -vac --bwlimit=2500 --whole-files /​home/​user/​stuff user@sharptail.wesleyan.edu:/​home/​user/​tmp/​ 
 + 
 +# sync my HPCC dir stuff folder into /archives locally on sharptail, then clean up 
 +# (first log in to sharptail)  
 + 
 +rsync -vac --bwlimit=2500 /​home/​user/​stuff/ ​ /​archives/​user/​stuff/​ 
 +rm -rf /​home/​user/​stuff/​*
  
-So to put it all togethermove my directory in my hom edirectory named stuff elsewhere+# generate a copy of content on Rstore disk array outside of HPCC but within wesleyan.edu 
 +# (get paths and share names from faculty memberon sharptail do)
  
-''​rsync --vac --delete ​--bwlimit=10000 --dry-run ​/home/username/​stuff  ​rstore0:/data/2/somelabgroup/mydirecotory/''​+rsync -vac --bwlimit=2500 /home/user/​stuff  ​user@rstoresrv0.wesleyan.edu:/data/2/labcontent/projects/
  
-Note the lack of source trailing slash but present destination trailing slash; meaning put source inside destination location. If both had a trailing slash it would mean; update source and target at these locations.+# you can also do this in reverse, log in to sharptail first
  
 +rsync -vac --bwlimt=2500 user@rstoresrv0.wesleyan.edu:/​data/​2/​labcontent/​projects/​stuff ​ /​home/​user/ ​
  
 +</​code>​
  
  
cluster/136.1423605788.txt.gz · Last modified: 2015/02/10 17:03 by hmeij