\\
**[[cluster:0|Back]]**
SNAPSHOTS ARE NOT ENABLED AS OF 06/30/2008 \\
--- //[[hmeij@wesleyan.edu|Meij, Henk]] 2008/06/30 09:11//
===== Backup Policy =====
The backup policy of the cluster is described below. There are 2 different mechanisms. [[cluster:34#netapp_fas_3050c|NetApp]] snapshots are taken and provide a convenient way to restore 'point-in-time'. Snapshots store the changes at the block level. Tivoli incremental backups store files when metadata of those files has changed. It serves as a backup for file restorations and deleted files.
| /sanscratch, /localscratch: nothing is backed up. |
| /home: the NetApp filer takes daily 2 snapshots (8AM & 8PM) and one weekly snapshot on Sundays at midnight. So restores can go back in time to these most recent 3 time periods. |
| /home: Tivoli performs nightly, incremental backups of the home directories. One active and one inactive versions of files are maintained. One deleted version is maintained if the file is deleted. Read the section at the bottom of this page for detailed information. |
===== NetApp =====
The disk space used for snapshot backups involves the space used on the LUNs, the volume occupied by the snapshots themselves, and the Fractional Reserve. The latter two reduce the available disk space of the volume that holds the LUNs. It breaks down like so. I'll update the stats periodically so we can measure growth.
==== NFS Volume ====
^Date^Filesystem^Size^Used^Avail^Used%^
|06/26/2008|total of the luns|4.0T|2.9T|1.1T|73%|
|09/26/2008|/vol/special_projects/cluster_luns|5.0T|3.8T|1.3T|76%|
==== LUNs (defunct) ====
We do not have LUNs anymore but one large NFS volume that we'll keep track of.
The LUNs have used up: (includes filesystem overhead).
^Filesystem^Size^Used^Avail^%^Used^Avail^%^Used^Avail^%^Used^Avail^%^Used^Avail^%^
|users|1008G|104M|957G|1%|343M|957G|1%|4.2G|953G|1%|38G|920G|4%|299G|659G|32%|
|cusers|1008G|104M|957G|1%|104M|957G|1%|104M|957G|1%|104M|957G|1%|104M|957G|1%|
|rusers|1008G|402G|555G|43%|593G|365G|62%|529G|429G|56%|668G|289G|70%|663G|295G|70%|
|rusers2|1008G|2G|956G|1%|2.6G|955G|1%|270G|688G|29%|410G|548G|43%|771G|187G|81%|
|rusers3|1008G|106M|957G|1%|106M|957G|1%|115M|957G|1%|666M|957G|1%|217G|714G|23%|
|rusers4|1008G |104M|957G|1%|104M|957G|1%|104M|957G|1%|104M|957G|1%|104M|957G|1%|
|rusers5|1008G|404G|554G|43%|134G|824G|14%|408G|550G|43%|339G|618G|36%|324G|634G|34%|
|rusers6|1008G|104M|957G|1%|104M|957G|1%|104M|957G|1%|104M|957G|1%|129M|957G|1%|
|rusers7|1008G|3G|955G|1%|6.9G|950G|1%|7.5G|950G|1%|8G|949G|1%|464G|494G|49%|
|rusers8|1008G|10G|948G|1%|22G|935G|3%|96G|862G|10%|206G|751G|22%|77G|880|8%|
|sanscratch|1008G|104M|957G|1%|156M|957G|1%|54G|903G|6%|74G|884G|8%|214M|957G|1%|
^ ^ ^07/09/2007: 985G^^^10/23/2007: 760G^^^12/18/2007: 1,370G^^^03/12/2008: 1,744G^^^06/25/2008: 2,944G^^^
==== Snapshots (defunct) ====
At this point in time, the NetApp snapshots chew up:
^Date^Volume^Name^Used^Total^Status^
|07/09/2007|cluster_home|weekly.0|3.811 GB|6.17 GB|normal|
|07/09/2007|cluster_home|hourly.0|1.469 GB|1.469 GB|normal|
|07/09/2007|cluster_home|nightly.0|911.3 MB|2.359 GB|normal|
^07/09/2007^^^^ 6.192 GB ^ 9.998 GB ^^
|10/28/2007|cluster_home|weekly.0|1.817 GB|3.565 GB|normal|
|10/28/2007|cluster_home|hourly.0|1.554 GB|1.748 GB|normal|
|10/28/2007|cluster_home|nightly.0|198.3 MB |198.3 MB|normal|
^10/28/2007^^^^ 3.569 GB ^ 5.511 GB ^^
|12/18/2007|cluster_home|weekly.0|19.05 GB|35.6 GB|normal|
|12/18/2007|cluster_home|hourly.0|15.6 GB|16.55 GB|normal|
|12/18/2007|cluster_home|nightly.0|965.9 MB |965.9 MB|normal|
^12/18/2007^^^^ 35.62 GB ^ 53.11 GB ^^
|03/12/2008|cluster_home|weekly.0|6.15 GB|7.08 GB|normal|
|03/12/2008|cluster_home|hourly.0|247 MB|948.4 MB|normal|
|03/12/2008|cluster_home|nightly.0|701.4 MB|701.4 MB|normal|
^03/12/2008^^^^ 7.1 GB ^ 8.8 GB ^^
Note: This has completely changed. Snapshots have been deleted as we're running out of space. So we only have TSM file level backups. Since snapshotting is turned of the fractional reserve is not implemented.
==== Filer3 (defunct) ====
So, puttting LUNs and Snapshots usage together we get.
^Date^Name^Status^Root^Aggregate^FlexClone^Avail^Used^Total^Files^Max Files^
|07/09/2007|cluster_home|online,raid_dp| |aggr0|-|2.03 TB|49%|4 TB|177|31.9 m|
|10/23/2007|cluster_home|online,raid_dp| |aggr0|-|1.83 TB|54%|4 TB|177|31.9 m|
|12/18/2007|cluster_home|online,raid_dp| |aggr0|-|1.41 TB|65%|4 TB|177|31.9 m|
|02/26/2008|cluster_home|online,raid_dp| |aggr0|-|1.03 TB|73%|4 TB|178|31.9 m|
|03/12/2008|cluster_home|online,raid_dp| |aggr0|-|1.04 TB|74%|4 TB|178|31.9 m|
That may surprise you, it is explained like this ... The "Fractional Reserve" is the space set aside for the first snapshot which equals the size of all the LUNs. Think of this as the initial backup. You can then think of the "Snapshots" as the ''delta'', the incrementals. The reserve can be set to less than 100% with some risk. For now, we'll leave it at this setting since we have no usage patterns yet.
^Date^What^Size of What^
|07/09/2007|LUNs| 985 GB|
|07/09/2007|Snapshots| <10 GB|
|07/09/2007|Fractional Reserve (100%)| 985 GB|
|07/09/2007^^ 1980 GB ^
|03/12/2008|LUNs| 1744 GB|
|03/12/2008|Snapshots| <10 GB|
|03/12/2008|Fractional Reserve (100%)| 1744 GB|
|03/12/2008^^ 3498 GB ^
Note: snapshots and fractional reserves are disabled.
===== Tivoli =====
The backup process copies data from client workstations to server storage to ensure against loss of data that is regularly changed. The server retains versions of a file according to policy, and replaces older versions of the file with newer versions. Policy includes the number of versions and the retention time for versions. A client node can restore the most recent version of a file, or can restore earlier versions.
- If a backup is an incremental type (always) :
- Back up the file only if the file or its attributes changed since the previous backup (date, size, ownership, permissions).
- If a file is being modified during backup, the program should do the following:
- Retry, and back up the file only if the file is no longer being modified.
- How many different versions of the file should be kept? 1.
- Number of days to keep inactive versions? 30.
- For backup files that a user has deleted from the client node:
- Number of file versions to keep? 1.
- Number of days to keep the last file version? 30.
A version becomes inactive when the client node stores a more recent backup version. Policy determines how many inactive versions of files the server keeps, and for how long. When files exceed the criteria, the files expire.
Swallowtail Usage. I'll update the stats periodically so we can measure growth.
^ ^ --Backup Policy Settings -- ^^^^ ^ ^ ^
^Date^Inactive^Deleted^Active^Inactive^ Physical (MB) ^ Number of Files ^ Note ^
| 07/09/2007 | 30 days | 30 days | 1 copy | 1 copy | 251,383 | 2,825,609 | first incremental backup |
| 10/28/2007 | 30 days | 30 days | 1 copy | 1 copy | 551,267 | 3,208,334 | |
| 12/18/2007 | 30 days | 30 days | 1 copy | 1 copy | 1,138,891 | 4,255,669 | |
| 02/26/2008 | 30 days | 30 days | 1 copy | 1 copy | 1,571,994 | 4,730,811 | |
| 03/12/2008 | 30 days | 30 days | 1 copy | 1 copy | 1,621,341 | 4,724,563 | |
| 06/26/2008 | 30 days | 30 days | 1 copy | 1 copy | 2,466,255 | 18,020,855 | |
| 09/26/2008 | 30 days | 30 days | 1 copy | 1 copy | 2,770,197 | 23,279,794 | |
^Date^^^^^ Total Transferred (last week) ^^ Note ^
| 10/28/2007 | 30 days | 30 days | 1 copy | 1 copy | 17 - 27 GB || daily range |
| 12/18/2007 | 30 days | 30 days | 1 copy | 1 copy | 50 - 152 GB || daily range |
| 02/26/2008 | 30 days | 30 days | 1 copy | 1 copy | 70 - 140 GB || daily range |
| 03/12/2008 | 30 days | 30 days | 1 copy | 1 copy | 16 - 31 GB || daily range |
| 06/25/2008 | 30 days | 30 days | 1 copy | 1 copy | 31 - 59 GB || daily range |
| 09/25/2008 | 30 days | 30 days | 1 copy | 1 copy | 7 - 19 GB || daily range |
\\
**[[cluster:0|Back]]**