Table of Contents


Back

<hi #ffff00> SNAPSHOTS ARE NOT ENABLED AS OF 06/30/2008</hi>
Meij, Henk 2008/06/30 09:11

Backup Policy

The backup policy of the cluster is described below. There are 2 different mechanisms. NetApp snapshots are taken and provide a convenient way to restore 'point-in-time'. Snapshots store the changes at the block level. Tivoli incremental backups store files when metadata of those files has changed. It serves as a backup for file restorations and deleted files.

/sanscratch, /localscratch: nothing is backed up.
/home: the NetApp filer takes daily 2 snapshots (8AM & 8PM) and one weekly snapshot on Sundays at midnight. So restores can go back in time to these most recent 3 time periods.
/home: Tivoli performs nightly, incremental backups of the home directories. One active and one inactive versions of files are maintained. One deleted version is maintained if the file is deleted. Read the section at the bottom of this page for detailed information.

NetApp

The disk space used for snapshot backups involves the space used on the LUNs, the volume occupied by the snapshots themselves, and the Fractional Reserve. The latter two reduce the available disk space of the volume that holds the LUNs. It breaks down like so. I'll update the stats periodically so we can measure growth.

NFS Volume

DateFilesystemSizeUsedAvailUsed%
06/26/2008total of the luns4.0T2.9T1.1T73%
09/26/2008/vol/special_projects/cluster_luns5.0T3.8T1.3T76%

LUNs (defunct)

<hi #ffff00>We do not have LUNs anymore but one large NFS volume that we'll keep track of.</hi> The LUNs have used up: (includes filesystem overhead).

FilesystemSizeUsedAvail%UsedAvail%UsedAvail%UsedAvail%UsedAvail%
users1008G104M957G1%343M957G1%4.2G953G1%38G920G4%299G659G32%
cusers1008G104M957G1%104M957G1%104M957G1%104M957G1%104M957G1%
rusers1008G402G555G43%593G365G62%529G429G56%668G289G70%663G295G70%
rusers21008G2G956G1%2.6G955G1%270G688G29%410G548G43%771G187G81%
rusers31008G106M957G1%106M957G1%115M957G1%666M957G1%217G714G23%
rusers41008G 104M957G1%104M957G1%104M957G1%104M957G1%104M957G1%
rusers51008G404G554G43%134G824G14%408G550G43%339G618G36%324G634G34%
rusers61008G104M957G1%104M957G1%104M957G1%104M957G1%129M957G1%
rusers71008G3G955G1%6.9G950G1%7.5G950G1%8G949G1%464G494G49%
rusers81008G10G948G1%22G935G3%96G862G10%206G751G22%77G8808%
sanscratch1008G104M957G1%156M957G1%54G903G6%74G884G8%214M957G1%
07/09/2007: 985G10/23/2007: 760G12/18/2007: 1,370G03/12/2008: 1,744G06/25/2008: 2,944G

Snapshots (defunct)

At this point in time, the NetApp snapshots chew up:

DateVolumeNameUsedTotalStatus
07/09/2007cluster_homeweekly.03.811 GB6.17 GBnormal
07/09/2007cluster_homehourly.01.469 GB1.469 GBnormal
07/09/2007cluster_homenightly.0911.3 MB2.359 GBnormal
07/09/2007 6.192 GB 9.998 GB
10/28/2007cluster_homeweekly.01.817 GB3.565 GBnormal
10/28/2007cluster_homehourly.01.554 GB1.748 GBnormal
10/28/2007cluster_homenightly.0198.3 MB 198.3 MBnormal
10/28/2007 3.569 GB 5.511 GB
12/18/2007cluster_homeweekly.019.05 GB35.6 GBnormal
12/18/2007cluster_homehourly.015.6 GB16.55 GBnormal
12/18/2007cluster_homenightly.0965.9 MB 965.9 MBnormal
12/18/2007 35.62 GB 53.11 GB
03/12/2008cluster_homeweekly.06.15 GB7.08 GBnormal
03/12/2008cluster_homehourly.0247 MB948.4 MBnormal
03/12/2008cluster_homenightly.0701.4 MB701.4 MBnormal
03/12/2008 7.1 GB 8.8 GB

Note: This has completely changed. Snapshots have been deleted as we're running out of space. So we only have TSM file level backups. Since snapshotting is turned of the fractional reserve is not implemented.

Filer3 (defunct)

So, puttting LUNs and Snapshots usage together we get.

DateNameStatusRootAggregateFlexCloneAvailUsedTotalFilesMax Files
07/09/2007cluster_homeonline,raid_dp aggr0-2.03 TB49%4 TB17731.9 m
10/23/2007cluster_homeonline,raid_dp aggr0-1.83 TB54%4 TB17731.9 m
12/18/2007cluster_homeonline,raid_dp aggr0-1.41 TB65%4 TB17731.9 m
02/26/2008cluster_homeonline,raid_dp aggr0-1.03 TB73%4 TB17831.9 m
03/12/2008cluster_homeonline,raid_dp aggr0-1.04 TB74%4 TB17831.9 m

That may surprise you, it is explained like this … The “Fractional Reserve” is the space set aside for the first snapshot which equals the size of all the LUNs. Think of this as the initial backup. You can then think of the “Snapshots” as the delta, the incrementals. The reserve can be set to less than 100% with some risk. For now, we'll leave it at this setting since we have no usage patterns yet.

DateWhatSize of What
07/09/2007LUNs 985 GB
07/09/2007Snapshots <10 GB
07/09/2007Fractional Reserve (100%) 985 GB
07/09/2007 1980 GB
03/12/2008LUNs 1744 GB
03/12/2008Snapshots <10 GB
03/12/2008Fractional Reserve (100%) 1744 GB
03/12/2008 3498 GB

Note: snapshots and fractional reserves are disabled.

Tivoli

The backup process copies data from client workstations to server storage to ensure against loss of data that is regularly changed. The server retains versions of a file according to policy, and replaces older versions of the file with newer versions. Policy includes the number of versions and the retention time for versions. A client node can restore the most recent version of a file, or can restore earlier versions.

  1. If a backup is an incremental type (always) :
    1. Back up the file only if the file or its attributes changed since the previous backup (date, size, ownership, permissions).
  2. If a file is being modified during backup, the program should do the following:
    1. Retry, and back up the file only if the file is no longer being modified.
  3. How many different versions of the file should be kept? 1.
  4. Number of days to keep inactive versions? 30.
  5. For backup files that a user has deleted from the client node:
    1. Number of file versions to keep? 1.
    2. Number of days to keep the last file version? 30.

A version becomes inactive when the client node stores a more recent backup version. Policy determines how many inactive versions of files the server keeps, and for how long. When files exceed the criteria, the files expire.

Swallowtail Usage. I'll update the stats periodically so we can measure growth.

–Backup Policy Settings –
DateInactiveDeletedActiveInactive Physical (MB) Number of Files Note
07/09/2007 30 days 30 days 1 copy 1 copy 251,383 2,825,609 first incremental backup
10/28/2007 30 days 30 days 1 copy 1 copy 551,267 3,208,334
12/18/2007 30 days 30 days 1 copy 1 copy 1,138,891 4,255,669
02/26/2008 30 days 30 days 1 copy 1 copy 1,571,994 4,730,811
03/12/2008 30 days 30 days 1 copy 1 copy 1,621,341 4,724,563
06/26/2008 30 days 30 days 1 copy 1 copy 2,466,255 18,020,855
09/26/2008 30 days 30 days 1 copy 1 copy 2,770,197 23,279,794
Date Total Transferred (last week) Note
10/28/2007 30 days 30 days 1 copy 1 copy 17 - 27 GB daily range
12/18/2007 30 days 30 days 1 copy 1 copy 50 - 152 GB daily range
02/26/2008 30 days 30 days 1 copy 1 copy 70 - 140 GB daily range
03/12/2008 30 days 30 days 1 copy 1 copy 16 - 31 GB daily range
06/25/2008 30 days 30 days 1 copy 1 copy 31 - 59 GB daily range
09/25/2008 30 days 30 days 1 copy 1 copy 7 - 19 GB daily range


Back