Date: Fri, 16 Dec 2016 17:08:28 -0800 From: Aleksandr Miroslav <alexmiroslav@gmail.com> To: freebsd-questions@freebsd.org Subject: Re: zfs (zxfer) replication -- "holes" in backups? Message-ID: <CACcSE1znx_3H=71hT_3TOu-tMhWjkm7_sx-nxLQA83iz4aeR6w@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
On Tue, Dec 13, 2016 at 3:44 PM, Aleksandr Miroslav <alexmiroslav@gmail.com> wrote: > I'm using zxfer to replicate my ZFS snapshot to another host. Occasionally, > for whatever reason, zxfer can't replicate a particular snapshot. > > What I find is that later when zxfer tries again, it skips that snapshot it > couldn't replicate and sends a newer one. This leaves the back up server > with with a "hole", i.e. a missing snapshot. So I think I may have found the reason for this problem I'm seeing... I'm using a pkg called zfstools to take my snapshots. It takes snapshots called frequent, hourly, daily, weekly, and monthly. The frequency of the last 4 you can probably guess. For "frequent", I take it every 15 minutes -- but not at the top of the hour, that's taken care of by the hourly snapshot. My cron looks like this: 15,30,45 * * * * root zfs-auto-snapshot frequent 4 0 * * * * root zfs-auto-snapshot hourly 24 0 0 * * * root zfs-auto-snapshot daily 31 0 0 * * 7 root zfs-auto-snapshot weekly 4 0 0 1 * * root zfs-auto-snapshot monthly 48 The number after the name of the snapshot is how many copies I keep around of that particular snapshot. You can problem see the problem right away: while I take an hourly snapshot every hour, and keep 24 copies of it (so that each hourly snapshot lives for 24 hours), I am only keeping the frequent snapshots for 4 copies. This means that each frequent snapshot only lives about 75 or 90 minutes max before it is deleted. Since my replication runs about every hour to my primary replica, and every 4 hours to another replica, and since the replication takes some time to run, it happens that a particular frequent snapshot could be marked for transfer at the start of the replication, but deleted before it can be transfered. (To be fair, I believe I had seen some errors from cron to this effect.) I believe the solution is to increase the number of frequent copies that are kept, such that each replication run can transfer all the frequent snapshots that it sees. I will increase this number and see if this fixes the problem. I will fix the already created holes manually as well.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACcSE1znx_3H=71hT_3TOu-tMhWjkm7_sx-nxLQA83iz4aeR6w>