This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:151 [2016/11/10 20:37] hmeij07 [Tuning] |
cluster:151 [2016/11/30 18:56] hmeij07 [Installation] |
||
---|---|---|---|
Line 6: | Line 6: | ||
A document for me to recall and make notes of what I read in the manual pages and what needs testing. | A document for me to recall and make notes of what I read in the manual pages and what needs testing. | ||
- | Basically during the Summer of 2016 I investigated if the HPCC could afford enterprise level storage. I wanted 99.999% uptime, snapshots, high availability and other goodies such as parallel NFS. Netapp came the closest but, eh, still at $42K lots of other options show up. The story is detailed | + | Basically during the Summer of 2016 I investigated if the HPCC could afford enterprise level storage. I wanted 99.999% uptime, snapshots, high availability and other goodies such as parallel NFS. Netapp came the closest but, eh, still at $42K lots of other options show up. That story is detailed at [[cluster: |
This page is best read from the bottom up. | This page is best read from the bottom up. | ||
- | ==== cluster idea ==== | + | ==== beeGFS |
- | * Storage servers: buy 2 now 4k+4k then 3rd in July 4k? | + | * Storage servers: |
+ | * buy 2 with each 12x2TB slow disk, Raid 6, 20T usable (clustered, parallel file system) | ||
+ | * create 2 6TB volumes on each, quota at 2TB via XFS, 3 users/ | ||
+ | * only $HOME changes to ''/ | ||
+ | * create 2 buddymirrors; | ||
+ | * on UPS | ||
+ | * on Infiniband | ||
- | * move test users over on 2 nodes, test, only change is $HOME | + | * Client servers: |
+ | * all compute/ | ||
- | * Home cluster | + | * Meta servers: |
- | * cottontail (mngt+admingiu) | + | * cottontail2 (root meta, on Infiniband) plus n38-n45 nodes (on Infiniband) |
- | * 2-3 new units storage (+snapshots/meta backup) | + | * all mirrored (total=9) |
- | * cottontail2 meta + n38-n45 meta, all mirrored | + | * cottontail2 on UPS |
+ | |||
+ | * Management and Monitor servers | ||
+ | * cottontail (on UPS, on Infiniband) | ||
+ | |||
+ | * Backups (rsnapshot.org via rsync daemons [[cluster: | ||
+ | * sharptail:/ | ||
+ | * serverA:/mnt/ | ||
+ | * serverB:/ | ||
+ | |||
+ | * Costs (includes 3 year NBD warranty) | ||
+ | * Microway $12,500 | ||
+ | * CDW | ||
==== beegfs-admin-gui ==== | ==== beegfs-admin-gui ==== | ||
Line 25: | Line 44: | ||
* '' | * '' | ||
- | ==== Resync Data ==== | + | ==== upgrade ==== |
+ | |||
+ | * [[http:// | ||
+ | * New feature - High Availability for Metadata Servers (self-healing, | ||
+ | |||
+ | ==== Resync Data #2 ==== | ||
+ | |||
+ | If you have 2 buddymirrors and 2 storage servers each with 2 storage objects, beegfs will write to all primary storage targets even if numtargets is to 1 ... it will use all storage objects so best to numtargets' | ||
+ | |||
+ | How does one add a server? | ||
+ | |||
+ | < | ||
+ | |||
+ | # define storage objects, 2 per server | ||
+ | [root@petaltail ~]# / | ||
+ | [root@petaltail ~]# / | ||
+ | [root@swallowtail data]# / | ||
+ | [root@swallowtail data]# / | ||
+ | |||
+ | |||
+ | [root@cottontail2 ~]# beegfs-df | ||
+ | METADATA SERVERS: | ||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | |||
+ | STORAGE TARGETS: | ||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | # define mirrrogroups | ||
+ | [root@cottontail2 ~]# beegfs-ctl --addmirrorgroup --primary=21701 --secondary=13601 --groupid=1 | ||
+ | [root@cottontail2 ~]# beegfs-ctl --addmirrorgroup --primary=13602 --secondary=21702 --groupid=2 | ||
+ | |||
+ | [root@cottontail2 ~]# beegfs-ctl --listmirrorgroups | ||
+ | | ||
+ | | ||
+ | 1 | ||
+ | 2 | ||
+ | |||
+ | # define buddygroups, | ||
+ | [root@cottontail2 ~]# beegfs-ctl --setpattern --buddymirror / | ||
+ | New chunksize: 524288 | ||
+ | New number of storage targets: 1 | ||
+ | Path: /home1 | ||
+ | Mount: / | ||
+ | |||
+ | [root@cottontail2 ~]# beegfs-ctl --setpattern --buddymirror / | ||
+ | New chunksize: 524288 | ||
+ | New number of storage targets: 1 | ||
+ | Path: /home2 | ||
+ | Mount: / | ||
+ | |||
+ | # drop /home/hmeij in / | ||
+ | [root@petaltail mysql_bak_ptt]# | ||
+ | 3623 | ||
+ | [root@petaltail mysql_bak_ptt]# | ||
+ | 3678 | ||
+ | [root@swallowtail data]# find / | ||
+ | 3623 | ||
+ | [root@swallowtail data]# find / | ||
+ | 3678 | ||
+ | |||
+ | # with numtargets=1 beegfs still writes to all primary targets found in all buddygroups | ||
+ | |||
+ | # rebuild test servers with from scratch with numparts=2 | ||
+ | # drop hmeij/ into home1/ and obtain slightly more files (couple of 100s), not double the amount | ||
+ | # /home/hmeij has 7808 files in it which gets split over primaries but numparts=2 would yield 15,616 files? | ||
+ | # drop another copy in home2/ and file counts double to circa 7808 | ||
+ | [root@cottontail2 ~]# beegfs-ctl --getentryinfo | ||
+ | Path: /home1 | ||
+ | Mount: / | ||
+ | EntryID: 0-583C50A1-FA | ||
+ | Metadata node: cottontail2 [ID: 250] | ||
+ | Stripe pattern details: | ||
+ | + Type: Buddy Mirror | ||
+ | + Chunksize: 512K | ||
+ | + Number of storage targets: desired: 2 | ||
+ | [root@cottontail2 ~]# beegfs-ctl --getentryinfo | ||
+ | Path: /home2 | ||
+ | Mount: / | ||
+ | EntryID: 1-583C50A1-FA | ||
+ | Metadata node: cottontail2 [ID: 250] | ||
+ | Stripe pattern details: | ||
+ | + Type: Buddy Mirror | ||
+ | + Chunksize: 512K | ||
+ | + Number of storage targets: desired: 2 | ||
+ | |||
+ | Source: /home/hmeij 7808 files in 10G | ||
+ | |||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | [root@cottontail2 ~]# rsync -ac --bwlimit=2500 /home/hmeij / | ||
+ | [root@cottontail2 ~]# rsync -ac --bwlimit=2500 /home/hmeij / | ||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | # first rsync drops roughly 5G in both primaries which then get copied to secondaries. | ||
+ | # second rsync does the same so both storage servers loose 20G roughly | ||
+ | # now shut a storage server down and the whole filesystem can still be accessed (HA) | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Resync Data #1 ==== | ||
[[http:// | [[http:// | ||
Line 36: | Line 170: | ||
* started a full --resyncstorage --mirrorgroupid=101 --timestamp=0 | * started a full --resyncstorage --mirrorgroupid=101 --timestamp=0 | ||
* got --getentryinfo EntryID for a file in my / | * got --getentryinfo EntryID for a file in my / | ||
- | * did a cat /mnt.beegfs/ | + | * did a cat /mnt/beegfs/ |
* brought primary storage down | * brought primary storage down | ||
* redid the cat above (it hangs for a couple of minutes, then displays the file content) | * redid the cat above (it hangs for a couple of minutes, then displays the file content) | ||
Line 359: | Line 493: | ||
* made easy [[http:// | * made easy [[http:// | ||
* rpms pulled from repository via petaltail in '' | * rpms pulled from repository via petaltail in '' | ||
+ | * '' | ||
+ | * use '' | ||
< | < |