User Tools

Site Tools


cluster:150

This is an old revision of the document!



Back

The problem

Trying to offload heavy read/write traffic from our file server. I also did a deep information dive to assess if we could afford enterprise level storage. That answer basically means a $42K layout at the low end and up to $70K for the high end. I've detailed the result here Enterprise Storage, lots to be gained by doing that but implementing the Short Term plan first as detailed on that page.

Avoiding NFS as much as possible, I'd like to set up rsyncd and use it in combination with rsnapshot generating daily, weekly and monthly point in time snapshots of sharptail:/home to cottontail:/mnt/home/.snapshots (only accessible by root). I'll then remount on localhost read only for self-serve restore actions by users themselves.

Using version 1.3.1 since I've seriously tested that out, from rsnapshot.org…the newer version uses arbitrary intervals like alpha, beta, gamma but my users will understand daily, weekly, monthly much better.

Download source, untar, follow simple instructions in file INSTALL on target server.

Next, and this took some time to dig up, on target server where the snapshots will reside

cpan YAML
cpan Lchown

On the source server (where /home is local) configure rsync for daemon mode. Create file /etc/rsync.conf with the following content. Notice that we expose /home read only which is a nice approach.

syslog facility = local3
read only = yes list = yes
auth users = root
secrets file = /etc/rsyncd.secrets
hosts allow = 10.11.103.253
uid = 0
gid = 0

[root]
comment = /
path = /

Create that secrets file, chmod 600, and put a long string in there, like ThisIsVerySecret. Start the daemons on the source server.

rsync --daemon

Back on target server where snapshots will reside, configure the /etc/rsnapshot.conf file, here are my settings

[root@cottontail .snapshots]# grep ^[a-z] /etc/rsnapshot.conf
config_version  1.2
snapshot_root   /mnt/home/.snapshots/
cmd_cp          /bin/cp
cmd_rm          /bin/rm
cmd_rsync       /usr/bin/rsync
cmd_ssh /usr/bin/ssh
cmd_logger      /bin/logger
cmd_du          /usr/bin/du
cmd_rsnapshot_diff      /usr/local/bin/rsnapshot-diff
cmd_preexec     /root/start.sh
cmd_postexec    /root/stop.sh
interval        daily   6
interval        weekly  4
interval        monthly 3
interval        yearly  1
verbose         3
loglevel        3
logfile /var/log/rsnapshot
lockfile        /var/run/rsnapshot.pid
rsync_short_args        -vac
rsync_long_args  --delete --numeric-ids --relative --delete-excluded --whole-file --bwlimit=60000 --stats
du_args -csh
one_fs          1
link_dest       1
backup  sharptail-ib0::root/home/       sharptail/

Couple of notes

  • cmd_preexec and cmd_postexec just echo the date to console and start/stop the rsync daemons remotely on source server
  • choose to use whole-file to speed things up
  • onefs is important
  • link_dest is important
  • the backup command points to my private IP of source server using the Infiniband network and then points to the module defined in /etc/rsync.conf on source server

Start things up

# on target
[root@cottontail ~]# /usr/local/bin/rsnapshot daily &

# on source
[root@sharptail ~]# lsof -i:873
COMMAND   PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
rsync   17814 root    3u  IPv4 261839487      0t0  TCP *:rsync (LISTEN)
rsync   17814 root    5u  IPv6 261839488      0t0  TCP *:rsync (LISTEN)
rsync   29717 root    6u  IPv4 261962741      0t0  TCP sharptail-ib0:rsync->cottontail-ib0:54069 (ESTABLISHED)


# check what rsync is doing
[root@sharptail ~]# strace -p 29717
Process 29717 attached - interrupt to quit
close(5)                                = 0
lstat("/home/hemamy/data-analysis-for-fcc-lattice-tetra/fcc/data-analysis-for-7-b-spacer/NPs-same-mass/old-data/msd/msd-diff-temp/5.0nmNP/115msd-5.0nmNP-t-115-tetras.dat", {st_mode=S_IFREG|0644, st_size=11125, ...}) = 0
open("/home/hemamy/data-analysis-for-fcc-lattice-tetra/fcc/data-analysis-for-7-b-spacer/NPs-same-mass/old-data/msd/msd-diff-temp/5.0nmNP/115msd-5.0nmNP-t-115-tetras.dat", O_RDONLY) = 5
read(5, "1\t43.224064\n2\t79.676036\n3\t111.20"..., 11125) = 11125
close(5)                                = 0
lstat("/home/hemamy/data-analysis-for-fcc-lattice-tetra/fcc/data-analysis-for-7-b-spacer/NPs-same-mass/old-data/msd/msd-diff-temp/5.0nmNP/a115ave-msd-5.0nmNP-t-115-tetras.dat", {st_mode=S_IFREG|0644, st_size=8, ...}) = 0
open("/home/hemamy/data-analysis-for-fcc-lattice-tetra/fcc/data-analysis-for-7-b-spacer/NPs-same-mass/old-data/msd/msd-diff-temp/5.0nmNP/a115ave-msd-5.0nmNP-t-115-tetras.dat", O_RDONLY) = 5
CTRL-C

Suggest you debug with a small directory first, /home is 10TB in our case with 40+ million files.

Then remount the user inaccessible area /mnt/home/.snapshots for user access on /snapshots

# /etc/exports content, then exportfs -ra
/mnt/home/.snapshots       localhost.localdomain(sync,rw,no_all_squash,no_root_squash)

# /etc/fstab content, then mount /snapshot/daily.0
/dev/mapper/VG0-lvhomesnaps        /mnt/home/.snapshots     xfs    rw,tcp,intr,bg              0 0
localhost:/mnt/home/.snapshots/daily.0 /snapshots/daily.0   nfs   ro   0 0

# test
[root@cottontail .snapshots]# touch /snapshots/daily.0/sharptail/home/hmeij/tmp/foo
touch: cannot touch `/snapshots/daily.1/sharptail/home/hmeij/tmp/foo': Read-only file system

Finally, if you get this error, which is hardly informative, you've set num-tries to 0 like I did and fussed over it for some time. Set to 1 or leave uncommented.

2016-09-15T11:37:06] /usr/local/bin/rsnapshot daily: ERROR: /usr/bin/rsync returned 0.00390625 while processing rsync://sharptail-ib00::root/home/hmeij/python/


Back

cluster/150.1474054277.txt.gz · Last modified: 2016/09/16 15:31 by hmeij07