\\ **[[cluster:0|Back]]** ===== beeGFS ===== A document for me to recall and make notes of what I read in the manual pages and what needs testing. Basically during the Summer of 2016 I investigated if the HPCC could afford enterprise level storage. I wanted 99.999% uptime, snapshots, high availability and other goodies such as parallel NFS. Netapp came the closest but, eh, still at $42K lots of other options show up. That story is detailed at [[cluster:149|The Storage Problem]] This page is best read from the bottom up. NOTE: I'm reluctantly giving up on beegfs, especially v6.1, it simply works flaky. In the admon gui I can see 2 storage nodes, 4 storage objects, 4 meta servers with clients installed on all meta. /mnt/beegfs is there and content can be created. Then I mirror storage nodes, all is fine. Then I mirror meta servers and the mirrors set up, enabling mirrormd states success. Then the whole environment hangs on /mnt/beegfs. My sense is helperd is not communication well in a private network environment with no DNS and does not consult /etc/hosts. But I have nothing to back that up with, so I can fix it. Back to adding more XFS into my cluster, I'll wait a few more versions. --- //[[hmeij@wesleyan.edu|Henk]] 2016/12/06 15:10// ==== beeGFS cluster idea ==== * Storage servers: * buy 2 with each 12x2TB slow disk, Raid 6, 20T usable (clustered, parallel file system) * create 2 6TB volumes on each, quota at 2TB via XFS, 3 users/server * only $HOME changes to ''/mnt/beegfs/home[1|2]'' (migrates ~4.5TB away from /home or ~50%) * create 2 buddymirrors; each with primary on one, secondary on the other server (high availability) * on UPS * on Infiniband * Client servers: * all compute/login nodes become beegfs clients * Meta servers: * cottontail2 (root meta, on Infiniband) plus n38-n45 nodes (on Infiniband) * all mirrored (total=9) * cottontail2 on UPS * Management and Monitor servers * cottontail (on UPS, on Infiniband) * Backups (rsnapshot.org via rsync daemons [[cluster:150|Rsync Daemon/Rsnapshot]]) * sharptail:/home --> cottontail * serverA:/mnt/beegfs/home1 --> serverB (8TB max) * serverB:/mnt/beegfs/home2 --> serverA (8TB max) * Costs (includes 3 year NBD warranty) * Microway $12,500 * CDW $14,700 ==== beegfs-admin-gui ==== * ''cottontail:/usr/local/bin/beegfs-admin-gui'' ==== upgrade ==== * [[http://www.beegfs.com/content/updating-upgrading-and-versioning/|External Link]] * New feature - High Availability for Metadata Servers (self-healing, transparent failover) A bit complicated. * Repo base URL baseurl=http://www.beegfs.com/release/beegfs_6/dists/rhel6 via http shows only 6.1-el6 * [ ] beegfs-mgmtd-6.1-el6.x86_64.rpm 2016-11-16 16:27 660K * '' yum --disablerepo "*" --enablerepo beegfs repolist'' shows * beegfs-mgmtd.x86_64 2015.03.r22-el6 beegfs * ''yum install --disablerepo "*" --enablerepo beegfs --downloadonly --downloaddir=/sanscratch/tmp/beegfs/beegfs_6/ *x86_64* -y'' * http://www.beegfs.com/release/beegfs_6/dists/rhel6/x86_64/beegfs-mgmtd-2015.03.r22-el6.x86_64.rpm: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found" <-- wrong package version So the wget/rpm approach (list all packages present on a particular node else you will get a dependencies failure!) # get them all wget http://www.beegfs.com/release/beegfs_6/dists/rhel6/x86_64/beegfs-mgmtd-6.1-el6.x86_64.rpm # client and meta node rpm -Uvh ./beegfs-common-6.1-el6.noarch.rpm ./beegfs-utils-6.1-el6.x86_64.rpm ./beegfs-opentk-lib-6.1-el6.x86_64.rpm ./beegfs-helperd-6.1-el6.x86_64.rpm ./beegfs-client-6.1-el6.noarch.rpm ./beegfs-meta-6.1-el6.x86_64.rpm # updated? [root@cottontail2 beegfs_6]# beegfs-ctl | head -2 BeeGFS Command-Line Control Tool (http://www.beegfs.com) Version: 6.1 #Sheeesh ==== Resync Data #2 ==== If you have 2 buddymirrors and 2 storage servers each with 2 storage objects, beegfs will write to all primary storage targets even if numtargets is to 1 ... it will use all storage objects so best to numtargets's value equal to the number of primary storage objects. And then of course the content flow from primary to secondary for high availability. How does one add a server? # define storage objects, 2 per server [root@petaltail ~]# /opt/beegfs/sbin/beegfs-setup-storage -p /data/lv1/beegfs_storage -s 217 -i 21701 -m cottontail [root@petaltail ~]# /opt/beegfs/sbin/beegfs-setup-storage -p /data/lv2/beegfs_storage -s 217 -i 21702 -m cottontail [root@swallowtail data]# /opt/beegfs/sbin/beegfs-setup-storage -p /data/lv1/beegfs_storage -s 136 -i 13601 -m cottontail [root@swallowtail data]# /opt/beegfs/sbin/beegfs-setup-storage -p /data/lv2/beegfs_storage -s 136 -i 13602 -m cottontail [root@cottontail2 ~]# beegfs-df METADATA SERVERS: TargetID Pool Total Free % ITotal IFree % ======== ==== ===== ==== = ====== ===== = 250 low 122.3GiB 116.6GiB 95% 7.8M 7.6M 98% STORAGE TARGETS: TargetID Pool Total Free % ITotal IFree % ======== ==== ===== ==== = ====== ===== = 13601 low 291.4GiB 164.6GiB 56% 18.5M 18.5M 100% 13602 low 291.4GiB 164.6GiB 56% 18.5M 18.5M 100% 21701 low 291.2GiB 130.5GiB 45% 18.5M 16.2M 87% 21702 low 291.2GiB 130.5GiB 45% 18.5M 16.2M 87% # define mirrrogroups [root@cottontail2 ~]# beegfs-ctl --addmirrorgroup [--nodetype=storage] --primary=21701 --secondary=13601 --groupid=1 [root@cottontail2 ~]# beegfs-ctl --addmirrorgroup [--nodetype=storage] --primary=13602 --secondary=21702 --groupid=2 [root@cottontail2 ~]# beegfs-ctl --listmirrorgroups BuddyGroupID PrimaryTargetID SecondaryTargetID ============ =============== ================= 1 21701 13601 2 13602 21702 # define buddygroups, numtargets=1 [root@cottontail2 ~]# beegfs-ctl --setpattern --buddymirror /mnt/beegfs/home1 --chunksize=512k --numtargets=1 New chunksize: 524288 New number of storage targets: 1 Path: /home1 Mount: /mnt/beegfs [root@cottontail2 ~]# beegfs-ctl --setpattern --buddymirror /mnt/beegfs/home2 --chunksize=512k --numtargets=1 New chunksize: 524288 New number of storage targets: 1 Path: /home2 Mount: /mnt/beegfs # drop /home/hmeij in /mnt/beegfs/home1/hmeij [root@petaltail mysql_bak_ptt]# find /data/lv1/beegfs_storage/ -type f | wc -l 3623 [root@petaltail mysql_bak_ptt]# find /data/lv2/beegfs_storage/ -type f | wc -l 3678 [root@swallowtail data]# find /data/lv1/beegfs_storage/ -type f | wc -l 3623 [root@swallowtail data]# find /data/lv2/beegfs_storage/ -type f | wc -l 3678 # with numtargets=1 beegfs still writes to all primary targets found in all buddygroups # rebuild test servers with from scratch with numparts=2 # drop hmeij/ into home1/ and obtain slightly more files (couple of 100s), not double the amount # /home/hmeij has 7808 files in it which gets split over primaries but numparts=2 would yield 15,616 files? # drop another copy in home2/ and file counts double to circa 7808 [root@cottontail2 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/home1 Path: /home1 Mount: /mnt/beegfs EntryID: 0-583C50A1-FA Metadata node: cottontail2 [ID: 250] Stripe pattern details: + Type: Buddy Mirror + Chunksize: 512K + Number of storage targets: desired: 2 [root@cottontail2 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/home2 Path: /home2 Mount: /mnt/beegfs EntryID: 1-583C50A1-FA Metadata node: cottontail2 [ID: 250] Stripe pattern details: + Type: Buddy Mirror + Chunksize: 512K + Number of storage targets: desired: 2 Source: /home/hmeij 7808 files in 10G TargetID Pool Total Free % ITotal IFree % ======== ==== ===== ==== = ====== ===== = 13601 low 291.4GiB 63.1GiB 22% 18.5M 18.5M 100% 13602 low 291.4GiB 63.1GiB 22% 18.5M 18.5M 100% 21701 low 291.2GiB 134.6GiB 46% 18.5M 16.2M 87% 21702 low 291.2GiB 134.6GiB 46% 18.5M 16.2M 87% [root@cottontail2 ~]# rsync -ac --bwlimit=2500 /home/hmeij /mnt/beegfs/home1/ & [root@cottontail2 ~]# rsync -ac --bwlimit=2500 /home/hmeij /mnt/beegfs/home2/ & TargetID Pool Total Free % ITotal IFree % ======== ==== ===== ==== = ====== ===== = 13601 low 291.4GiB 43.5GiB 15% 18.5M 18.5M 100% 13602 low 291.4GiB 43.5GiB 15% 18.5M 18.5M 100% 21701 low 291.2GiB 114.9GiB 39% 18.5M 16.1M 87% 21702 low 291.2GiB 114.9GiB 39% 18.5M 16.1M 87% # first rsync drops roughly 5G in both primaries which then get copied to secondaries. # second rsync does the same so both storage servers loose 20G roughly # now shut a storage server down and the whole filesystem can still be accessed (HA) ==== Resync Data #1 ==== [[http://www.beegfs.com/wiki/StorageSynchronization|StorageSynchronization Link]] //If the primary storage target of a buddy group is unreachable, it will get marked as offline and a failover to the secondary target will be issued. In this case, the former secondary target will become the new primary target.// Testing out fail over and deletion of data on secondary then a full resync process: * started a full --resyncstorage --mirrorgroupid=101 --timestamp=0 * got --getentryinfo EntryID for a file in my /mnt/beegfs/home/path/to/file and did the same for the directory the file was located in * did a cat /mnt/beegfs/home/path/to/file on a client (just fine) * brought primary storage down * redid the cat above (it hangs for a couple of minutes, then displays the file content) * while primary down, I ran rm -rf /mnt/beegfs/home/path/to/ removing directory holding file * a cat now generates the expected file not found error * brought up primary and started a full --resyncstorage --mirrorgroupid=101 --timestamp=0 * the nr of files and dirs discovered is as expected lower by the correct values * when I now search for the EntryIDs obtained before they are gone from /data/beegfs-storage (as expected). Nice that it works. So you can full storage content mirror. You'll still need rsnapshots to recover lost data or point in time restores. ==== Mirror Data ==== When not all storage servers are up, client mounts will fail. This is just an optional "sanity check" which the client performs when it is mounted. Disable this check by setting "sysMountSanityCheckMS=0" in beegfs-client.conf. When the sanity check is disabled, the client mount will succeed even if no servers are running. In order to able able to take a storage server off line without any impact, all content needs to mirrored. ** Before ** [root@cottontail2 ~]# beegfs-df METADATA SERVERS: TargetID Pool Total Free % ITotal IFree % ======== ==== ===== ==== = ====== ===== = 48 low 29.5GiB 23.3GiB 79% 1.9M 1.5M 82% 49 low 29.5GiB 23.1GiB 78% 1.9M 1.5M 82% 250 low 122.3GiB 116.7GiB 95% 7.8M 7.6M 98% STORAGE TARGETS: TargetID Pool Total Free % ITotal IFree % ======== ==== ===== ==== = ====== ===== = 13601 low 291.4GiB 50.6GiB 17% 18.5M 18.4M 100% 21701 low 291.2GiB 61.8GiB 21% 18.5M 15.8M 85% ** Before ** # define buddygroup - these are storage target IDs [root@n7 ~]# beegfs-ctl --addmirrorgroup --primary=13601 --secondary=21701 --groupid=101 Mirror buddy group successfully set: groupID 101 -> target IDs 13601, 21701 [root@n7 ~]# beegfs-ctl --listmirrorgroups BuddyGroupID PrimaryTargetID SecondaryTargetID ============ =============== ================= 101 13601 21701 # enable mirroring for data by directory -numTargets needs to be set to max nr of storage servers? # changed to 11/02/2016: [root@n7 ~]# beegfs-ctl --setpattern --buddymirror /mnt/beegfs/home --chunksize=512k [root@n7 ~]# beegfs-ctl --setpattern --buddymirror /mnt/beegfs/hmeij-mirror-data --chunksize=512k --numtargets=2 New chunksize: 524288 New number of storage targets: 2 Path: /hmeij-mirror-data Mount: /mnt/beegfs # copy some contents in (~hmeij is 10G) [root@n7 ~]# rsync -vac --bwlimit /home/hmeij /mnt/beegfs/hmeij-mirror-data/ ** After ** [root@n7 ~]# beegfs-df METADATA SERVERS: (almost no changes...) STORAGE TARGETS: (each target less circa 10G) TargetID Pool Total Free % ITotal IFree % ======== ==== ===== ==== = ====== ===== = 13601 low 291.4GiB 40.7GiB 14% 18.5M 18.4M 99% 21701 low 291.2GiB 51.9GiB 18% 18.5M 15.8M 85% # lets find an object [root@n7 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/hmeij-mirror-data/hmeij/xen/bvm1.img Path: /hmeij-mirror-data/hmeij/xen/bvm1.img Mount: /mnt/beegfs EntryID: 178-581797C8-30 Metadata node: n38 [ID: 48] Stripe pattern details: + Type: Buddy Mirror + Chunksize: 512K + Number of storage targets: desired: 2; actual: 1 + Storage mirror buddy groups: + 101 # original [root@n7 ~]# ls -lh /mnt/beegfs/hmeij-mirror-data/hmeij/xen/bvm1.img -rwxr-xr-x 1 hmeij its 4.9G 2014-04-07 13:39 /mnt/beegfs/hmeij-mirror-data/hmeij/xen/bvm1.img # copy on primary [root@petaltail chroots]# ls -lh /var/chroots/data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30 -rw-rw-rw- 1 root root 4.9G Apr 7 2014 /var/chroots/data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30 ^^^^^^^^ # copy on secondary [root@swallowtail ~]# find /data/beegfs_storage -name 178-581797C8-30 /data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30 [root@swallowtail ~]# ls -lh /data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30 -rw-rw-rw- 1 root root 4.9G Apr 7 2014 /data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30 ^^^^^^^^ # seems to work, notice the ''buddymir'' directory on primary/secondary Here is an important note, from community list: * "actual: 1" means "1 buddy mirror group" * so the important line that tells you that this file is mirrored is "Type: Buddy Mirror". * "desired: 2" means you would like to stripe across 2 buddy groups. (targets are buddygroups here) Another note: I changed paths for mirrormd and buddymirror to ''/mnt/beegfs/home'' and now I see connectivity data for meta node cottontail2 which was previously missing because I working on sub directory level. [root@cottontail2 ~]# beegfs-net meta_nodes ============= cottontail2 [ID: 250] Connections: RDMA: 1 (10.11.103.250:8005); [root@cottontail2 ~]# beegfs-ctl --listnodes --nodetype=meta --details cottontail2 [ID: 250] Ports: UDP: 8005; TCP: 8005 Interfaces: ib1(RDMA) ib1(TCP) eth1(TCP) eth0(TCP) ^^^ ==== Quota ==== * [[http://www.beegfs.com/wiki/EnableQuota|External Link]] * setup XFS * enable beegfs quota on all clients * enforce quota * set quotas using a text file * seems straightforward * do BEFORE populating XFS file systems ==== Meta Backup/Restore ===== [[http://www.fhgfs.com/wiki/wikka.php?wakka=FAQ#ea_backup|External Link]] # latest tar rpm -Uvh /sanscratch/tmp/beegfs/tar-1.23-15.el6_8.x86_64.rpm # backup cd /data; tar czvf /sanscratch/tmp/beegfs/meta-backup/n38-meta.tar.gz beegfs_meta/ --xattrs # restore cd /data; tar xvf /sanscratch/tmp/beegfs/meta-backup/n38-meta.tar.gz --xattrs # test cd /data; diff -r beegfs_meta beegfs_meta.orig # no results ==== Resync Meta ==== [[http://www.beegfs.com/wiki/AboutMirroring2012#hn_59ca4f8bbb_4|External Link]] * older versions * new future version will work like storage mirror with HA and self-healing ==== Mirror Meta ==== //Metadata mirroring can currently not be disabled after it has been enabled for a certain directory// Definitely want Meta content mirrored, that way I can use the n38-n45 nodes with local 15K disk, plus maybe cottontail2 (raid 1 with hot and cold spare). Content mirroring will require more disk space. Perhaps snapshots to another node is more useful, also solves backup issue. V6 does buddymirror meta mirroring [[http://www.beegfs.com/wiki/MDMirror|External Link]] # 2015.03 enable meta mirroring, directory based # change to 11/04/2016: used --createdir to make this home. [root@n7 ~]# beegfs-ctl --mirrormd /mnt/beegfs/home [root@n7 ~]# beegfs-ctl --mirrormd /mnt/beegfs/hmeij-mirror Mount: '/mnt/beegfs'; Path: '/hmeij-mirror' Operation succeeded. # V6.1 does it a root level not from a path beegfs-ctl --addmirrorgroup --nodetype=meta --primary=38 --secondary=39 --groupid=1 beegfs-ctl --addmirrorgroup --nodetype=meta --primary=250 --secondary=37 --groupid=2 beegfs-ctl --mirrromd # put some new content in [root@n7 ~]# rsync -vac /home/hmeij/iozone-tests /mnt/beegfs/hmeij-mirror/ # lookup meta tag [root@n7 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/hmeij-mirror/iozone-tests/current.tar Path: /hmeij-mirror/iozone-tests/current.tar Mount: /mnt/beegfs EntryID: 3-581392E1-31 # find [root@sharptail ~]# ssh n38 find /data/beegfs_meta -name 3-581392E1-31 /data/beegfs_meta/mirror/49.dentries/54/6C/0-581392F0-30/#fSiDs#/3-581392E1-31 ^^^^^^ ^^ # and find [root@sharptail ~]# ssh n39 find /data/beegfs_meta -name 3-581392E1-31 /data/beegfs_meta/dentries/54/6C/0-581392F0-30/#fSiDs#/3-581392E1-31 # seems to work Writing some initial content to both storage and meta servers; vanilla out of the box beegfs seems to balance the writes across both equally. Here are some stats. ==== /mnt/beegfs/ ==== * Source content 110G in XFS with ~100,000 files in ~2,000 dirs * /home/hmeij (mix of files, nothing large) plus * /home/fstarr/filler (lots of tiny files) * File content spread across 2 storage servers * petaltail:/var/chroot/data/beegfs_storage * swallowtail:/data/beegfs_storage * 56G used in beegfs-storage per storage server * ~92,400 files per storage server * ~1,400 dirs per storage server mostly in "chunks" dir * Meta content spread across 2 meta servers (n37 and n38) * 338MB per beegfs-meta server so 0.006% space wise for 2 servers * ~105,000 files per metadata server * ~35,000 dirs almost spread evenly across "dentries" and "inodes" * Client (n7 and n8) see 110G in /mnt/beegfs * 110G in /mnt/beegfs * ~100,000 files * ~2,000 dirs Looks like: * NOTE: failed to mount /mn/beegfs is the result of out of space storage servers. # file content [root@swallowtail ~]# ls -lR /data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31 /data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31: total 672 -rw-rw-rw- 1 root root 289442 Jun 26 2015 D8-57E42E89-30 -rw-rw-rw- 1 root root 3854 Jun 26 2015 D9-57E42E89-30 -rw-rw-rw- 1 root root 16966 Jun 26 2015 DA-57E42E89-30 -rw-rw-rw- 1 root root 65779 Jun 26 2015 DB-57E42E89-30 -rw-rw-rw- 1 root root 20562 Jun 26 2015 DF-57E42E89-30 -rw-rw-rw- 1 root root 259271 Jun 26 2015 E0-57E42E89-30 -rw-rw-rw- 1 root root 372 Jun 26 2015 E1-57E42E89-30 [root@petaltail ~]# ls -lR /var/chroots/data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31 /var/chroots/data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31: total 144 -rw-rw-rw- 1 root root 40 Jun 26 2015 DC-57E42E89-30 -rw-rw-rw- 1 root root 40948 Jun 26 2015 DD-57E42E89-30 -rw-rw-rw- 1 root root 100077 Jun 26 2015 DE-57E42E89-30 # meta content [root@sharptail ~]# ssh n38 find /data/beegfs_meta -name 169-57E42E75-31 /data/beegfs_meta/inodes/6A/7E/169-57E42E75-31 /data/beegfs_meta/dentries/6A/7E/169-57E42E75-31 [root@sharptail ~]# ssh n39 find /data/beegfs_meta -name 169-57E42E75-31 (none, no mirror) ==== Tuning ==== * global interfaces files ib0->eth1->eth0 * connInterfacesFile = /home/tmp/global/beegfs.connInterfacesFile * set in /etc/beegfs-[storage|client|meta|admon|mgmtd].conf and restart services * backup/restore/mirror * see more towards top this page * storage server tuning * set on cottontail on sdb, both values were 128 (seems to help -- late summer 2016) * echo 4096 > /sys/block/sd?/queue/nr_requests * echo 4096 > /sys/block/sd?/queue/read_ahead_kb * set on cottontail, was 90112 + /etc/rc.local * echo 262144 > /proc/sys/vm/min_free_kbytes * do same on greentail? (done late fall 2016) * all original values same as cottontail (all files) * set on c1d1 thru c1d6 * do same on sharptail? * no such values for sdb1 * can only find min_free_kbytes, same value as cottontail * stripe and chunk size [root@n7 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/ Path: Mount: /mnt/beegfs EntryID: root Metadata node: n38 [ID: 48] Stripe pattern details: + Type: RAID0 + Chunksize: 512K + Number of storage targets: desired: 4 * The cache type can be set in the client config file (/etc/beegfs/beegfs-client.conf). * buffered is default, few 100k per file * tuneNumWorkers in all /etc/beegfs/beggfs-C.conf file * for meta, storage and clients ... * metadata server tuning * read in more detail ==== Installation ==== * made easy [[http://www.beegfs.com/wiki/ManualInstallWalkThrough|External Link]] * rpms pulled from repository via petaltail in ''greentail:/sanscratch/tmp/beegfs'' * ''yum --disablerepo "*" --enablerepo beegfs list available'' * use ''yumdownloader'' [root@cottontail ~]# ssh n7 beegfs-net mgmt_nodes ============= cottontail [ID: 1] Connections: TCP: 1 (10.11.103.253:8008); meta_nodes ============= n38 [ID: 48] Connections: TCP: 1 (10.11.103.48:8005); n39 [ID: 49] Connections: TCP: 1 (10.11.103.49:8005); storage_nodes ============= swallowtail [ID: 136] Connections: TCP: 1 (192.168.1.136:8003 [fallback route]); petaltail [ID: 217] Connections: TCP: 1 (192.168.1.217:8003 [fallback route]); \\ **[[cluster:0|Back]]**