User Tools

Site Tools


cluster:151

This is an old revision of the document!



Back

beeGFS

A document for me to recall and make notes of what I read in the manual pages and what needs testing.

Basically during the Summer of 2016 I investigated if the HPCC could afford enterprise level storage. I wanted 99.999% uptime, snapshots, high availability and other goodies such as parallel NFS. Netapp came the closest but, eh, still at $42K lots of other options show up. The story is detailed here at The Storage Problem

This page is best read from the bottom up.

cluster idea

  • Storage servers: buy 2 now 4k+4k then 3rd in July 4k?
  • move test users over on 2 nodes, test, only change is $HOME
  • Home cluster
    • cottontail (mngt+admingiu)
    • 2-3 new units storage (+snapshots/meta backup)
    • cottontail2 meta + n38-n45 meta, all mirrored

beegfs-admin-gui

  • cottontail:/usr/local/bin/beegfs-admin-gui

Mirror Data

Before

[root@cottontail2 ~]# beegfs-df
METADATA SERVERS:
TargetID        Pool        Total         Free    %      ITotal       IFree    %
========        ====        =====         ====    =      ======       =====    =
      48         low      29.5GiB      23.3GiB  79%        1.9M        1.5M  82%
      49         low      29.5GiB      23.1GiB  78%        1.9M        1.5M  82%
     250         low     122.3GiB     116.7GiB  95%        7.8M        7.6M  98%

STORAGE TARGETS:
TargetID        Pool        Total         Free    %      ITotal       IFree    %
========        ====        =====         ====    =      ======       =====    =
   13601         low     291.4GiB      50.6GiB  17%       18.5M       18.4M 100%
   21701         low     291.2GiB      61.8GiB  21%       18.5M       15.8M  85%

Before

# define buddygroup - these are storage target IDs
[root@n7 ~]# beegfs-ctl --addmirrorgroup --primary=13601 --secondary=21701 --groupid=101
Mirror buddy group successfully set: groupID 101 -> target IDs 13601, 21701

[root@n7 ~]# beegfs-ctl --listmirrorgroups
     BuddyGroupID   PrimaryTargetID SecondaryTargetID
     ============   =============== =================
              101             13601             21701
              
# enable mirroring for data by directory -numTargets needs to be set to max nr of storage servers?
[root@n7 ~]# beegfs-ctl --setpattern --buddymirror /mnt/beegfs/hmeij-mirror-data --chunksize=512k --numtargets=2
New chunksize: 524288
New number of storage targets: 2
Path: /hmeij-mirror-data
Mount: /mnt/beegfs

# copy some contents in (~hmeij is 10G)
[root@n7 ~]# rsync -vac --bwlimit /home/hmeij /mnt/beegfs/hmeij-mirror-data/ 

After

[root@n7 ~]# beegfs-df

METADATA SERVERS: (almost no changes...)
STORAGE TARGETS: (each target less circa 10G)
TargetID        Pool        Total         Free    %      ITotal       IFree    %
========        ====        =====         ====    =      ======       =====    =
   13601         low     291.4GiB      40.7GiB  14%       18.5M       18.4M  99%
   21701         low     291.2GiB      51.9GiB  18%       18.5M       15.8M  85%

# lets find an object
[root@n7 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/hmeij-mirror-data/hmeij/xen/bvm1.img
Path: /hmeij-mirror-data/hmeij/xen/bvm1.img
Mount: /mnt/beegfs
EntryID: 178-581797C8-30
Metadata node: n38 [ID: 48]
Stripe pattern details:
+ Type: Buddy Mirror
+ Chunksize: 512K
+ Number of storage targets: desired: 2; actual: 1
+ Storage mirror buddy groups:
  + 101

# original
[root@n7 ~]# ls -lh /mnt/beegfs/hmeij-mirror-data/hmeij/xen/bvm1.img
-rwxr-xr-x 1 hmeij its 4.9G 2014-04-07 13:39 /mnt/beegfs/hmeij-mirror-data/hmeij/xen/bvm1.img

# copy on primary
[root@petaltail chroots]# ls -lh /var/chroots/data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30
-rw-rw-rw- 1 root root 4.9G Apr  7  2014 /var/chroots/data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30

# copy on secondary
[root@swallowtail ~]# find /data/beegfs_storage -name 178-581797C8-30
/data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30
[root@swallowtail ~]# ls -lh /data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30
-rw-rw-rw- 1 root root 4.9G Apr  7  2014 /data/beegfs_storage/buddymir/u2018/5817/9/60-58179513-30/178-581797C8-30

# seems to work, notice the ''buddymir'' directory on primary/secondary

Quota

  • setup XFS
  • enable beegfs quota on all clients
  • enforce quota
    • set quotas using a text file
    • seems straightforward
  • do BEFORE populating XFS file systems

Mirror Meta

Definitely want Meta content mirrored, that way I can use the n38-n45 nodes with local 15K disk, plus maybe cottontail2 (raid 1 with hot and cold spare).

Content mirroring will require more disk space. Perhaps snapshots to another node is more useful, also solves backup issue.

# enable meta mirroring, directory based

[root@n7 ~]# beegfs-ctl --mirrormd /mnt/beegfs/hmeij-mirror
Mount: '/mnt/beegfs'; Path: '/hmeij-mirror'
Operation succeeded.

# put some new content in 
[root@n7 ~]# rsync -vac /home/hmeij/iozone-tests /mnt/beegfs/hmeij-mirror/

# lookup meta tag
[root@n7 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/hmeij-mirror/iozone-tests/current.tar
Path: /hmeij-mirror/iozone-tests/current.tar
Mount: /mnt/beegfs
EntryID: 3-581392E1-31

# find
[root@sharptail ~]# ssh n38 find /data/beegfs_meta -name 3-581392E1-31
/data/beegfs_meta/mirror/49.dentries/54/6C/0-581392F0-30/#fSiDs#/3-581392E1-31
                  ^^^^^^
# and find
[root@sharptail ~]# ssh n39 find /data/beegfs_meta -name 3-581392E1-31
/data/beegfs_meta/dentries/54/6C/0-581392F0-30/#fSiDs#/3-581392E1-31

# seems to work

Writing some initial content to both storage and meta servers; vanilla out of the box beegfs seems to balance the writes across both equally. Here are some stats.

/mnt/beegfs/

  • Source content 110G in XFS with ~100,000 files in ~2,000 dirs
    • /home/hmeij (mix of files, nothing large) plus
    • /home/fstarr/filler (lots of tiny files)
  • File content spread across 2 storage servers
  • petaltail:/var/chroot/data/beegfs_storage
  • swallowtail:/data/beegfs_storage
  • 56G used in beegfs-storage per storage server
  • ~92,400 files per storage server
  • ~1,400 dirs per storage server mostly in “chunks” dir
  • Meta content spread across 2 meta servers (n37 and n38)
    • 338MB per beegfs-meta server so 0.006% space wise for 2 servers
    • ~105,000 files per metadata server
    • ~35,000 dirs almost spread evenly across “dentries” and “inodes”
  • Client (n7 and n8) see 110G in /mnt/beegfs
    • 110G in /mnt/beegfs
    • ~100,000 files
    • ~2,000 dirs

Looks like:

  • NOTE: failed to mount /mn/beegfs is the result of out of space storage servers.
# file content

[root@swallowtail ~]# ls -lR /data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31
/data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31:
total 672
-rw-rw-rw- 1 root root 289442 Jun 26  2015 D8-57E42E89-30
-rw-rw-rw- 1 root root   3854 Jun 26  2015 D9-57E42E89-30
-rw-rw-rw- 1 root root  16966 Jun 26  2015 DA-57E42E89-30
-rw-rw-rw- 1 root root  65779 Jun 26  2015 DB-57E42E89-30
-rw-rw-rw- 1 root root  20562 Jun 26  2015 DF-57E42E89-30
-rw-rw-rw- 1 root root 259271 Jun 26  2015 E0-57E42E89-30
-rw-rw-rw- 1 root root    372 Jun 26  2015 E1-57E42E89-30

[root@petaltail ~]# ls -lR /var/chroots/data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31
/var/chroots/data/beegfs_storage/chunks/u0/57E4/2/169-57E42E75-31:
total 144
-rw-rw-rw- 1 root root     40 Jun 26  2015 DC-57E42E89-30
-rw-rw-rw- 1 root root  40948 Jun 26  2015 DD-57E42E89-30
-rw-rw-rw- 1 root root 100077 Jun 26  2015 DE-57E42E89-30

# meta content

[root@sharptail ~]# ssh n38 find /data/beegfs_meta -name 169-57E42E75-31
/data/beegfs_meta/inodes/6A/7E/169-57E42E75-31
/data/beegfs_meta/dentries/6A/7E/169-57E42E75-31

[root@sharptail ~]# ssh n39 find /data/beegfs_meta -name 169-57E42E75-31
(none, no mirror)

Tuning

  • global interfaces files ib0→eth1→eth0
    • connInterfacesFile = /home/tmp/global/beegfs.connInterfacesFile
    • set in /etc/beegfs-[storage|client|meta|admon|mgmtd].conf and restart services
  • backup beeGFS EA metadata, see faq
    • attempt a restore
    • or just snapshot
  • storage server tuning
    • set on cottontail on sdb, both values were 128 (seems to help – late summer 2016)
    • echo 4096 > /sys/block/sd?/queue/nr_requests
    • echo 4096 > /sys/block/sd?/queue/read_ahead_kb
    • set on cottontail, was 90112 + /etc/rc.local
    • echo 262144 > /proc/sys/vm/min_free_kbytes
  • do same on greentail? (done late fall 2016)
    • all original values same as cottontail (all files)
    • set on c1d1 thru c1d6
  • do same on sharptail?
    • no such values for sdb1
    • can only find min_free_kbytes, same value as cottontail
  • stripe and chunk size
[root@n7 ~]# beegfs-ctl --getentryinfo /mnt/beegfs/
Path:
Mount: /mnt/beegfs
EntryID: root
Metadata node: n38 [ID: 48]
Stripe pattern details:
+ Type: RAID0
+ Chunksize: 512K
+ Number of storage targets: desired: 4
  • The cache type can be set in the client config file (/etc/beegfs/beegfs-client.conf).
    • buffered is default, few 100k per file
  • tuneNumWorkers in all /etc/beegfs/beggfs-C.conf file
    • for meta, storage and clients …
  • metadata server tuning
    • read in more detail

Installation

  • made easy External Link
  • rpms pulled from repository via petaltail in greentail:/sanscratch/tmp/beegfs
[root@cottontail ~]# ssh n7 beegfs-net

mgmt_nodes
=============
cottontail [ID: 1]
   Connections: TCP: 1 (10.11.103.253:8008);

meta_nodes
=============
n38 [ID: 48]
   Connections: TCP: 1 (10.11.103.48:8005);
n39 [ID: 49]
   Connections: TCP: 1 (10.11.103.49:8005);

storage_nodes
=============
swallowtail [ID: 136]
   Connections: TCP: 1 (192.168.1.136:8003 [fallback route]);
petaltail [ID: 217]
   Connections: TCP: 1 (192.168.1.217:8003 [fallback route]);


Back

cluster/151.1478180721.txt.gz · Last modified: 2016/11/03 13:45 by hmeij07