User Tools

Site Tools


cluster:135

This is an old revision of the document!



Back

RSTORE FAQ

This is the template that the primary share owner will receive in order to get the process moving.

What is it?

Our current platform to provide disk storage capacity for users and groups is called Flexstorage. In this model users and groups purchase storage as needed for backup, replication and/or snapshosts. Our next platform will be called Rstore (stands for remote or research storage) and in this model disk space is allocated to users and groups upon request. Both platforms are made up of lots of slow spinning disks and should not be relied on for performance.

How do I access it?

You will use your Wesleyan Active Directory (AD) credentials, that is, the username/password combination used to access all other Wesleyan services. There will be two service addresses: rstore0.wesleyan.edu and rstore2.wesleyan.edu. These addresses are each backed with a pair of integrated storage and server modules (rstoresrv[0&1].wesleyan.edu for rstore0 and rstoresrv[2&3].wesleyan.edu for rstore2). The service addresses will point to the primary member of each pair. During a fail over event the secondary will become the primary and handle the traffic for that service address. The failed primary, after being fixed, will then resurface as secondary.

There are several ways to access your contents either by logging in or mounting the shares. More information is provided below with specifics for your share.

How is is configured?

Each primary/secondary pair contain a /home directory that is kept in sync. This is solely for SSH access. Users quotas are very small (10 MB) and should only be used for scripts for example. Each primary/secondary's disk arrays is carved up in into four areas to hold shares (/data/1, /data/2, /data3 and /data/4). Shares will be distributed across the four 26 TB data filesystems allowing all shares room for growth.

The secondary members within each pair, nightly replicate any new data from the primary via a pulling action. Please note that if a file is missing on primary, the file will be deleted on secondary. The frequency of “pulling” is described as nightly but depending on volume of delta changes can span more than 24 hours, it is continuous. This happens for all four data filesystems.

Only the shares located on the third data filesystem (/data/3), in addition to replication, will perform snapshots. This happens on both primary and secondary, locally, from for example /data/3/share_name to /data/4/snapshots/share_name, and also happens nightly. Replication is duplicating data on primary member to be in sync with data on secondary member. In the event of a fail over, the contents are available on short notice, but if a file is corrupt or missing, it will be so on both members following a replication event. Snapshots are point-in-time comparisons of the share contents. This will be done nightly and snapshots will be kept for: 6 daily, 4 weekly, 2 monthly. Thus if a file is deleted or corrupt it can be restored. This only happens for shares on /data/3 while we can sustain such disk usage.

Note: Because of replication, large reorganizations of content areas (250+ gb or 10,000+ files) causes a lot of deletions first, then recopying. Please describe in detail what needs to be moved and we can perform those actions on primary and secondary avoiding this.

Note: For very large filesystems whose contents does not change, the replication can take a long time and is typically unnecessary. Share owners can control what gets replicated, or not, by staging a file in the top level share folder named rsync.incl or rsync.excl. The files contain lines of absolute paths to the folders to be either skipped or included only during replication. For example: /data/1/share_name/projects/2005 (do not use weird characters or spaces). An include file only replicates those folders, and exclude file replicates the whole share but skips the listed folders.

Is my content safe?

Your content is protected using these methods:

  • When the content was transferred from source to target the -c option of rsync was used meaning perform a checksum check guaranteeing that both files have the same unique finger print.
  • Once copied to disk arrays the content is protected with redundant array of independent disk technology, in our case RAID 60. This technique keeps multiple copies of data chunks so that failed disks can be repaired. RAID 60 can withstand two simultaneous disk failures.
  • RAID cards managing the arrays continually scrub and test the disks for sign of probable near future failure events thus reducing actual failures.
  • In the event of a primary catastrophic failure of a member in a pair, the content is available on the secondary member in a state since last replication event.
  • In an accidental deletion of content in shares in /data/3 (ONLY), there will be snapshots available for restoration (daily, weekly, monthly).
  • Both primary/secondary members reside in the same data center but not in the same rack location wise.
  • Upon deployment of your share checksums will be calculated for each file on both primary and secondary members. These “hashdeep” signatures (file size, two different checksums (MD5, SHA-256), and absolute path to file) will be stored in a text file. That text file can be used in an audit to find out if anything about these files has changed and a report can be provided. If this gets automated will need to be assessed in the future.

How do I log in, mount?

  • Your share's service address: rstore0.wesleyan.edu
  • Your share's location and name: /data/[1-4]/SHARE_NAME
  • Your share's default permissions are:
    • share owner (SHARE_OWNER) rwx, share group (HSARE_GROUP) rwx, others none
    • share members own their own folders created
    • but group members have rwx access to folders and files
  • SSH access
    • open any ssh client and connect to service address, you end up in /home/username
    • then change directories to your share location


Back

cluster/135.1417727547.txt.gz · Last modified: 2014/12/04 16:12 by hmeij