From owner-freebsd-questions@freebsd.org  Sat Dec 17 01:08:30 2016
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CCF3C823F1
 for <freebsd-questions@mailman.ysv.freebsd.org>;
 Sat, 17 Dec 2016 01:08:30 +0000 (UTC)
 (envelope-from alexmiroslav@gmail.com)
Received: from mail-vk0-x22c.google.com (mail-vk0-x22c.google.com
 [IPv6:2607:f8b0:400c:c05::22c])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0E9821638
 for <freebsd-questions@freebsd.org>; Sat, 17 Dec 2016 01:08:30 +0000 (UTC)
 (envelope-from alexmiroslav@gmail.com)
Received: by mail-vk0-x22c.google.com with SMTP id w194so96992389vkw.2
 for <freebsd-questions@freebsd.org>; Fri, 16 Dec 2016 17:08:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:from:date:message-id:subject:to;
 bh=Wgofy5Vv/a1bJ3E66qPNrcPB7FgAlZiDZUEx9qJokpU=;
 b=Mc6kcRhXDc/4orUGxWCGFLAMc07gyrR0TY0GMVpSWQa8JVBCsxUZggBVoly8OV6oF5
 Eid4P+23en2hF4VinzpfGCB2NjKZn1p9t9H3KmoCz8thj947CcyJE5o8l6AZoyaGI4Kp
 vezjqS7Jg5T8aWO6+pl17EpcIxxfof9VZ9UanwazkazoQE7CToCyo5ViiPeH3fWOsjY8
 kLjGGtrKF8RsJtODQQVx96GcC6h/SMFbmuja55uY3D5DAPblY/2A/o7tIbA0MvT74E/d
 SA/TfV7MOpT8C4K5ghmsBf+Gekf2feS4v+moLbhRNmWjFS/nSBsQl1b1np3FH5tqHOHT
 hhJA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=Wgofy5Vv/a1bJ3E66qPNrcPB7FgAlZiDZUEx9qJokpU=;
 b=kAAph6uCGC4njqa3mYSGkGPdVXs9B6GVhEaKQsp5+p2CEb8ufS2YhSU3UIKD3CFkfN
 452/fFa2AFzVNQicBSML1M6brNBIkDDcIJhPmvrk98eSiqdC22X8dhTstBZGIBbQgxeW
 V7iYUWQFXEy7nWsfoPR8ACxtfg1nTjpNz7CZQSuxJdXjvdMtnsbypSxsEQuCmTD6PM8M
 EfpxH62W9CH46C53sa5M6e1FhhIhPh04w0Jhre5gFUUTl0b+dUzbk00W+2GEN2YEuAhL
 xGYZYUe3U//hacyqRFQYWJpCOyF8eZq8qk1WBxvFknpQN/NSyP9GhX2RsJ7aaG15TX3g
 ERJA==
X-Gm-Message-State: AIkVDXKoaxo8vVrTMJ2c2K1Kv9Sk4M2f9KlSxIpLFIpe7HcTAWkm36ou3zhKEAHJuGSgCGFM13x38wWm5sMgcQ==
X-Received: by 10.31.107.13 with SMTP id g13mr2156987vkc.64.1481936908926;
 Fri, 16 Dec 2016 17:08:28 -0800 (PST)
MIME-Version: 1.0
Received: by 10.176.2.140 with HTTP; Fri, 16 Dec 2016 17:08:28 -0800 (PST)
From: Aleksandr Miroslav <alexmiroslav@gmail.com>
Date: Fri, 16 Dec 2016 17:08:28 -0800
Message-ID: <CACcSE1znx_3H=71hT_3TOu-tMhWjkm7_sx-nxLQA83iz4aeR6w@mail.gmail.com>
Subject: Re: zfs (zxfer) replication -- "holes" in backups?
To: freebsd-questions@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 17 Dec 2016 01:08:30 -0000

On Tue, Dec 13, 2016 at 3:44 PM, Aleksandr Miroslav <alexmiroslav@gmail.com>
wrote:
> I'm using zxfer to replicate my ZFS snapshot to another host.
Occasionally,
> for whatever reason, zxfer can't replicate a particular snapshot.
>
> What I find is that later when zxfer tries again, it skips that snapshot
it
> couldn't replicate and sends a newer one. This leaves the back up server
> with with a "hole", i.e. a missing snapshot.

So I think I may have found the reason for this problem I'm seeing...

I'm using a pkg called zfstools to take my snapshots. It takes snapshots
called frequent, hourly, daily, weekly, and monthly.

The frequency of the last 4 you can probably guess. For "frequent", I
take it every 15 minutes -- but not at the top of the hour, that's taken
care of by the hourly snapshot.

My cron looks like this:

    15,30,45 * * * * root zfs-auto-snapshot frequent  4
    0        * * * * root zfs-auto-snapshot hourly    24
    0        0 * * * root zfs-auto-snapshot daily     31
    0        0 * * 7 root zfs-auto-snapshot weekly    4
    0        0 1 * * root zfs-auto-snapshot monthly   48

The number after the name of the snapshot is how many copies I keep
around of that particular snapshot.

You can problem see the problem right away: while I take an hourly
snapshot every hour, and keep 24 copies of it (so that each hourly
snapshot lives for 24 hours), I am only keeping the frequent snapshots
for 4 copies. This means that each frequent snapshot only lives about 75
or 90 minutes max before it is deleted.

Since my replication runs about every hour to my primary replica, and
every 4 hours to another replica, and since the replication takes some
time to run, it happens that a particular frequent snapshot could be
marked for transfer at the start of the replication, but deleted before
it can be transfered. (To be fair, I believe I had seen some errors from
cron to this effect.)

I believe the solution is to increase the number of frequent copies that
are kept, such that each replication run can transfer all the frequent
snapshots that it sees. I will increase this number and see if this
fixes the problem. I will fix the already created holes manually as
well.