This shows you the differences between two versions of the page.
cluster:21 [2007/06/28 15:08] |
cluster:21 [2007/06/28 15:08] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | \\ | ||
+ | **[[cluster: | ||
+ | |||
+ | |||
+ | |||
+ | ===== SAN File Systems ===== | ||
+ | |||
+ | The idea of managing very large file systems has certain implications. | ||
+ | |||
+ | * a single point of failure (if the part of the file systems goes corrupt, does the entire file systems go off line?) | ||
+ | * //fsck// may take an excessive amount of time (one reference i found was 1 hour/TB for a clean file system, other references we've seen is days for 1 TB in reference to a mail spool) | ||
+ | |||
+ | |||
+ | |||
+ | ^NOTE: once we receive our disks, before going online, we need to test this. Build this table out.^ | ||
+ | |||
+ | ^ size ^ files ^ time ^ | ||
+ | | 264 GB| 28,000 (50% 1MB & 50% 100MB) | 15 mins | | ||
+ | | 528 GB| 56,000 (50% 1MB & 50% 100MB) | 24 mins | | ||
+ | | 792 GB | 84,000 (50% 1MB & 50% 100MB) | 26 mins | | ||
+ | | 985 GB | 84,200 (50% 1MB & 50% 100MB) | 32 mins | | ||
+ | ^ size ^ files ^ time ^ | ||
+ | | 250 GB | 62,500,000 files in 6,250 dirs (all 4Kb files, 10,000 per dir) | 25 min | | ||
+ | | 500 GB | 125,000,000 files in 12,500 dirs (all 4Kb files, 10,000 per dir) | 30 min | | ||
+ | | '' | ||
+ | | 280 GB | 8,400,000 in 41,138 dirs (copy of mail spool) | 2 hrs and 40 min | | ||
+ | A better idea might be to split up the file systems. For purposes of our discussion lets assume a corrupt file system of 1 TB would result 2-3 hours of //fsck// (file repair and checking), and we're willing to assume such down times. We then create 2 volumes on our SAN. | ||
+ | |||
+ | |{{: | ||
+ | |||
+ | * [top] SAN scratch volume, / | ||
+ | * [bot] user home volume, / | ||
+ | |||
+ | LUNs in both volumes are created with setting 'no space reserved' | ||
+ | |||
+ | In the user volume, we can export from the io node, the LUN labeled / | ||
+ | |||
+ | Other users, like for example fstarr and dbeveridge, could occupy (respectively) / | ||
+ | |||
+ | In this manner, the largest exposure to a corrupt file system is **by LUN**. | ||
+ | |||
+ | However, some monitoring program would have to be written. | ||
+ | |||
+ | ^It should be noted that the available space in a LUN is a "high water mark" | ||
+ | ---- | ||
+ | |||
+ | ^Some Relevant Posts^ | ||
+ | |[[http:// | ||
+ | |[[http:// | ||
+ | |||
+ | \\ | ||
+ | **[[cluster: |