This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cluster:151 [2016/10/28 19:56] hmeij07 [Tuning] |
cluster:151 [2016/12/01 19:02] hmeij07 [Mirror Meta] |
||
---|---|---|---|
Line 6: | Line 6: | ||
A document for me to recall and make notes of what I read in the manual pages and what needs testing. | A document for me to recall and make notes of what I read in the manual pages and what needs testing. | ||
- | Basically during the Summer of 2016 I investigated if the HPCC could afford enterprise level storage. I wanted 99.999% uptime, snapshots, high availability and other goodies such as parallel NFS. Netapp came the closest but, eh, still at $42K lots of other options show up. The story is detailed | + | Basically during the Summer of 2016 I investigated if the HPCC could afford enterprise level storage. I wanted 99.999% uptime, snapshots, high availability and other goodies such as parallel NFS. Netapp came the closest but, eh, still at $42K lots of other options show up. That story is detailed at [[cluster: |
This page is best read from the bottom up. | This page is best read from the bottom up. | ||
- | ==== cluster idea ==== | + | ==== beeGFS |
- | * Storage servers: buy 2 now 4k+4k then 3rd in July 4k? | + | * Storage servers: |
+ | * buy 2 with each 12x2TB slow disk, Raid 6, 20T usable (clustered, parallel file system) | ||
+ | * create 2 6TB volumes on each, quota at 2TB via XFS, 3 users/ | ||
+ | * only $HOME changes to ''/ | ||
+ | * create 2 buddymirrors; | ||
+ | * on UPS | ||
+ | * on Infiniband | ||
- | * move test users over on 2 nodes, test, only change is $HOME | + | * Client servers: |
+ | * all compute/ | ||
- | * Home cluster | + | * Meta servers: |
- | * cottontail | + | * cottontail2 (root meta, on Infiniband) plus n38-n45 nodes (on Infiniband) |
- | * 2-3 new units storage | + | * all mirrored |
- | * cottontail2 | + | * cottontail2 |
+ | * Management and Monitor servers | ||
+ | * cottontail (on UPS, on Infiniband) | ||
+ | |||
+ | * Backups (rsnapshot.org via rsync daemons [[cluster: | ||
+ | * sharptail:/ | ||
+ | * serverA:/ | ||
+ | * serverB:/ | ||
+ | |||
+ | * Costs (includes 3 year NBD warranty) | ||
+ | * Microway $12,500 | ||
+ | * CDW | ||
+ | |||
+ | ==== beegfs-admin-gui ==== | ||
+ | |||
+ | * '' | ||
+ | |||
+ | ==== upgrade ==== | ||
+ | |||
+ | * [[http:// | ||
+ | * New feature - High Availability for Metadata Servers (self-healing, | ||
+ | |||
+ | A bit complicated. | ||
+ | |||
+ | * Repo base URL baseurl=http:// | ||
+ | * [ ] beegfs-mgmtd-6.1-el6.x86_64.rpm | ||
+ | * '' | ||
+ | * beegfs-mgmtd.x86_64 | ||
+ | * '' | ||
+ | * http:// | ||
+ | |||
+ | |||
+ | So the wget/rpm approach (list all packages present on a particular node else you will get a dependencies failure!) | ||
+ | |||
+ | < | ||
+ | |||
+ | # get them all | ||
+ | wget http:// | ||
+ | |||
+ | # client and meta node | ||
+ | rpm -Uvh ./ | ||
+ | |||
+ | # updated? | ||
+ | [root@cottontail2 beegfs_6]# beegfs-ctl | head -2 | ||
+ | BeeGFS Command-Line Control Tool (http:// | ||
+ | Version: 6.1 | ||
+ | |||
+ | #Sheeesh | ||
+ | </ | ||
+ | |||
+ | |||
+ | ==== Resync Data #2 ==== | ||
+ | |||
+ | If you have 2 buddymirrors and 2 storage servers each with 2 storage objects, beegfs will write to all primary storage targets even if numtargets is to 1 ... it will use all storage objects so best to numtargets' | ||
+ | |||
+ | How does one add a server? | ||
+ | |||
+ | < | ||
+ | |||
+ | # define storage objects, 2 per server | ||
+ | [root@petaltail ~]# / | ||
+ | [root@petaltail ~]# / | ||
+ | [root@swallowtail data]# / | ||
+ | [root@swallowtail data]# / | ||
+ | |||
+ | |||
+ | [root@cottontail2 ~]# beegfs-df | ||
+ | METADATA SERVERS: | ||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | |||
+ | STORAGE TARGETS: | ||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | # define mirrrogroups | ||
+ | [root@cottontail2 ~]# beegfs-ctl --addmirrorgroup [--nodetype=storage] --primary=21701 --secondary=13601 --groupid=1 | ||
+ | [root@cottontail2 ~]# beegfs-ctl --addmirrorgroup [--nodetype=storage] --primary=13602 --secondary=21702 --groupid=2 | ||
+ | |||
+ | [root@cottontail2 ~]# beegfs-ctl --listmirrorgroups | ||
+ | | ||
+ | | ||
+ | 1 | ||
+ | 2 | ||
+ | |||
+ | # define buddygroups, | ||
+ | [root@cottontail2 ~]# beegfs-ctl --setpattern --buddymirror / | ||
+ | New chunksize: 524288 | ||
+ | New number of storage targets: 1 | ||
+ | Path: /home1 | ||
+ | Mount: /mnt/beegfs | ||
+ | |||
+ | [root@cottontail2 ~]# beegfs-ctl --setpattern --buddymirror / | ||
+ | New chunksize: 524288 | ||
+ | New number of storage targets: 1 | ||
+ | Path: /home2 | ||
+ | Mount: /mnt/beegfs | ||
+ | |||
+ | # drop /home/hmeij in / | ||
+ | [root@petaltail mysql_bak_ptt]# | ||
+ | 3623 | ||
+ | [root@petaltail mysql_bak_ptt]# | ||
+ | 3678 | ||
+ | [root@swallowtail data]# find / | ||
+ | 3623 | ||
+ | [root@swallowtail data]# find / | ||
+ | 3678 | ||
+ | |||
+ | # with numtargets=1 beegfs still writes to all primary targets found in all buddygroups | ||
+ | |||
+ | # rebuild test servers with from scratch with numparts=2 | ||
+ | # drop hmeij/ into home1/ and obtain slightly more files (couple of 100s), not double the amount | ||
+ | # /home/hmeij has 7808 files in it which gets split over primaries but numparts=2 would yield 15,616 files? | ||
+ | # drop another copy in home2/ and file counts double to circa 7808 | ||
+ | [root@cottontail2 ~]# beegfs-ctl --getentryinfo | ||
+ | Path: /home1 | ||
+ | Mount: /mnt/beegfs | ||
+ | EntryID: 0-583C50A1-FA | ||
+ | Metadata node: cottontail2 [ID: 250] | ||
+ | Stripe pattern details: | ||
+ | + Type: Buddy Mirror | ||
+ | + Chunksize: 512K | ||
+ | + Number of storage targets: desired: 2 | ||
+ | [root@cottontail2 ~]# beegfs-ctl --getentryinfo | ||
+ | Path: /home2 | ||
+ | Mount: /mnt/beegfs | ||
+ | EntryID: 1-583C50A1-FA | ||
+ | Metadata node: cottontail2 [ID: 250] | ||
+ | Stripe pattern details: | ||
+ | + Type: Buddy Mirror | ||
+ | + Chunksize: 512K | ||
+ | + Number of storage targets: desired: 2 | ||
+ | |||
+ | Source: /home/hmeij 7808 files in 10G | ||
+ | |||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | [root@cottontail2 ~]# rsync -ac --bwlimit=2500 /home/hmeij / | ||
+ | [root@cottontail2 ~]# rsync -ac --bwlimit=2500 /home/hmeij / | ||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | # first rsync drops roughly 5G in both primaries which then get copied to secondaries. | ||
+ | # second rsync does the same so both storage servers loose 20G roughly | ||
+ | # now shut a storage server down and the whole filesystem can still be accessed (HA) | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Resync Data #1 ==== | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | //If the primary storage target of a buddy group is unreachable, | ||
+ | |||
+ | Testing out fail over and deletion of data on secondary then a full resync process: | ||
+ | |||
+ | |||
+ | * started a full --resyncstorage --mirrorgroupid=101 --timestamp=0 | ||
+ | * got --getentryinfo EntryID for a file in my / | ||
+ | * did a cat / | ||
+ | * brought primary storage down | ||
+ | * redid the cat above (it hangs for a couple of minutes, then displays the file content) | ||
+ | * while primary down, I ran rm -rf / | ||
+ | * a cat now generates the expected file not found error | ||
+ | * brought up primary and started a full --resyncstorage --mirrorgroupid=101 --timestamp=0 | ||
+ | * the nr of files and dirs discovered is as expected lower by the correct values | ||
+ | * when I now search for the EntryIDs obtained before they are gone from / | ||
+ | |||
+ | Nice that it works. | ||
+ | |||
+ | So you can full storage content mirror. You'll still need rsnapshots to recover lost data or point in time restores. | ||
+ | |||
+ | ==== Mirror Data ==== | ||
+ | |||
+ | When not all storage servers are up, client mounts will fail. This is just an optional " | ||
+ | |||
+ | In order to able able to take a storage server off line without any impact, all content needs to mirrored. | ||
+ | |||
+ | ** Before ** | ||
+ | |||
+ | < | ||
+ | [root@cottontail2 ~]# beegfs-df | ||
+ | METADATA SERVERS: | ||
+ | TargetID | ||
+ | ======== | ||
+ | 48 | ||
+ | 49 | ||
+ | | ||
+ | |||
+ | STORAGE TARGETS: | ||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ** Before ** | ||
+ | |||
+ | < | ||
+ | |||
+ | # define buddygroup - these are storage target IDs | ||
+ | [root@n7 ~]# beegfs-ctl --addmirrorgroup --primary=13601 --secondary=21701 --groupid=101 | ||
+ | Mirror buddy group successfully set: groupID 101 -> target IDs 13601, 21701 | ||
+ | |||
+ | [root@n7 ~]# beegfs-ctl --listmirrorgroups | ||
+ | | ||
+ | | ||
+ | 101 | ||
+ | | ||
+ | # enable mirroring for data by directory -numTargets needs to be set to max nr of storage servers? | ||
+ | # changed to 11/02/2016: | ||
+ | [root@n7 ~]# beegfs-ctl --setpattern --buddymirror / | ||
+ | [root@n7 ~]# beegfs-ctl --setpattern --buddymirror / | ||
+ | New chunksize: 524288 | ||
+ | New number of storage targets: 2 | ||
+ | Path: / | ||
+ | Mount: /mnt/beegfs | ||
+ | |||
+ | # copy some contents in (~hmeij is 10G) | ||
+ | [root@n7 ~]# rsync -vac --bwlimit /home/hmeij / | ||
+ | |||
+ | </ | ||
+ | |||
+ | ** After ** | ||
+ | |||
+ | < | ||
+ | |||
+ | [root@n7 ~]# beegfs-df | ||
+ | |||
+ | METADATA SERVERS: (almost no changes...) | ||
+ | STORAGE TARGETS: (each target less circa 10G) | ||
+ | TargetID | ||
+ | ======== | ||
+ | | ||
+ | | ||
+ | |||
+ | # lets find an object | ||
+ | [root@n7 ~]# beegfs-ctl --getentryinfo / | ||
+ | Path: / | ||
+ | Mount: /mnt/beegfs | ||
+ | EntryID: 178-581797C8-30 | ||
+ | Metadata node: n38 [ID: 48] | ||
+ | Stripe pattern details: | ||
+ | + Type: Buddy Mirror | ||
+ | + Chunksize: 512K | ||
+ | + Number of storage targets: desired: 2; actual: 1 | ||
+ | + Storage mirror buddy groups: | ||
+ | + 101 | ||
+ | |||
+ | # original | ||
+ | [root@n7 ~]# ls -lh / | ||
+ | -rwxr-xr-x 1 hmeij its 4.9G 2014-04-07 13:39 / | ||
+ | |||
+ | # copy on primary | ||
+ | [root@petaltail chroots]# ls -lh / | ||
+ | -rw-rw-rw- 1 root root 4.9G Apr 7 2014 / | ||
+ | ^^^^^^^^ | ||
+ | |||
+ | # copy on secondary | ||
+ | [root@swallowtail ~]# find / | ||
+ | / | ||
+ | [root@swallowtail ~]# ls -lh / | ||
+ | -rw-rw-rw- 1 root root 4.9G Apr 7 2014 / | ||
+ | ^^^^^^^^ | ||
+ | |||
+ | # seems to work, notice the '' | ||
+ | |||
+ | </ | ||
+ | |||
+ | Here is an important note, from community list: | ||
+ | |||
+ | * " | ||
+ | * so the important line that tells you that this file is mirrored is "Type: Buddy Mirror" | ||
+ | * " | ||
+ | |||
+ | Another note: I changed paths for mirrormd and buddymirror to ''/ | ||
+ | |||
+ | < | ||
+ | |||
+ | [root@cottontail2 ~]# beegfs-net | ||
+ | meta_nodes | ||
+ | ============= | ||
+ | cottontail2 [ID: 250] | ||
+ | | ||
+ | |||
+ | [root@cottontail2 ~]# beegfs-ctl --listnodes --nodetype=meta --details | ||
+ | cottontail2 [ID: 250] | ||
+ | | ||
+ | | ||
+ | ^^^ | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Quota ==== | ||
+ | |||
+ | * [[http:// | ||
+ | * setup XFS | ||
+ | * enable beegfs quota on all clients | ||
+ | * enforce quota | ||
+ | * set quotas using a text file | ||
+ | * seems straightforward | ||
+ | * do BEFORE populating XFS file systems | ||
+ | |||
+ | ==== Meta Backup/ | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | < | ||
+ | |||
+ | # latest tar | ||
+ | rpm -Uvh / | ||
+ | |||
+ | # backup | ||
+ | cd /data; tar czvf / | ||
+ | |||
+ | # restore | ||
+ | cd / | ||
+ | |||
+ | # test | ||
+ | cd /data; diff -r beegfs_meta beegfs_meta.orig | ||
+ | # no results | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | ==== Resync Meta ==== | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | * older versions | ||
+ | * new future version will work like storage mirror with HA and self-healing | ||
==== Mirror Meta ==== | ==== Mirror Meta ==== | ||
- | Definitely wnat Meta content mirrored, that way you use the n38-n45 nodes with local 15K disk, plus maybe cottontail2 (raid 1 with hot and cold spare). | + | //Metadata mirroring can currently not be disabled after it has been enabled for a certain directory// |
- | Content | + | Definitely want Meta content mirrored, that way I can use the n38-n45 nodes with local 15K disk, plus maybe cottontail2 (raid 1 with hot and cold spare). |
- | - | + | |
+ | Content | ||
+ | |||
+ | V6 does buddymirror meta mirroring [[http:// | ||
< | < | ||
- | # enable | + | # 2015.03 |
+ | # change to 11/04/2016: used --createdir to make this home. | ||
+ | [root@n7 ~]# beegfs-ctl --mirrormd / | ||
[root@n7 ~]# beegfs-ctl --mirrormd / | [root@n7 ~]# beegfs-ctl --mirrormd / | ||
Mount: '/ | Mount: '/ | ||
Operation succeeded. | Operation succeeded. | ||
+ | |||
+ | # V6.1 does it a root level not from a path | ||
+ | beegfs-ctl --addmirrorgroup --nodetype=meta --primary=38 --secondary=39 --groupid=1 | ||
+ | beegfs-ctl --addmirrorgroup --nodetype=meta --primary=250 --secondary=37 --groupid=2 | ||
+ | beegfs-ctl --mirrromd | ||
# put some new content in | # put some new content in | ||
Line 46: | Line 407: | ||
[root@sharptail ~]# ssh n38 find / | [root@sharptail ~]# ssh n38 find / | ||
/ | / | ||
+ | ^^^^^^ ^^ | ||
# and find | # and find | ||
[root@sharptail ~]# ssh n39 find / | [root@sharptail ~]# ssh n39 find / | ||
Line 54: | Line 415: | ||
</ | </ | ||
+ | |||
+ | Writing some initial content to both storage and meta servers; vanilla out of the box beegfs seems to balance the writes across both equally. Here are some stats. | ||
+ | |||
+ | |||
==== / | ==== / | ||
Line 78: | Line 443: | ||
Looks like: | Looks like: | ||
+ | |||
+ | * NOTE: failed to mount /mn/beegfs is the result of out of space storage servers. | ||
< | < | ||
Line 117: | Line 484: | ||
* set in / | * set in / | ||
- | * backup | + | * backup/restore/mirror |
- | * attempt a restore | + | * see more towards top this page |
- | * or just snapshot | + | |
* storage server tuning | * storage server tuning | ||
Line 160: | Line 526: | ||
* made easy [[http:// | * made easy [[http:// | ||
+ | * rpms pulled from repository via petaltail in '' | ||
+ | * '' | ||
+ | * use '' | ||
< | < | ||
Line 168: | Line 537: | ||
============= | ============= | ||
cottontail [ID: 1] | cottontail [ID: 1] | ||
- | | + | |
meta_nodes | meta_nodes | ||
============= | ============= | ||
n38 [ID: 48] | n38 [ID: 48] | ||
- | | + | |
n39 [ID: 49] | n39 [ID: 49] | ||
- | | + | |
storage_nodes | storage_nodes | ||
============= | ============= | ||
swallowtail [ID: 136] | swallowtail [ID: 136] | ||
- | | + | |
petaltail [ID: 217] | petaltail [ID: 217] | ||
- | | + | |
</ | </ | ||
\\ | \\ | ||
**[[cluster: | **[[cluster: |