From owner-freebsd-questions@freebsd.org Sat Dec 17 01:08:30 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CCF3C823F1 for ; Sat, 17 Dec 2016 01:08:30 +0000 (UTC) (envelope-from alexmiroslav@gmail.com) Received: from mail-vk0-x22c.google.com (mail-vk0-x22c.google.com [IPv6:2607:f8b0:400c:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0E9821638 for ; Sat, 17 Dec 2016 01:08:30 +0000 (UTC) (envelope-from alexmiroslav@gmail.com) Received: by mail-vk0-x22c.google.com with SMTP id w194so96992389vkw.2 for ; Fri, 16 Dec 2016 17:08:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=Wgofy5Vv/a1bJ3E66qPNrcPB7FgAlZiDZUEx9qJokpU=; b=Mc6kcRhXDc/4orUGxWCGFLAMc07gyrR0TY0GMVpSWQa8JVBCsxUZggBVoly8OV6oF5 Eid4P+23en2hF4VinzpfGCB2NjKZn1p9t9H3KmoCz8thj947CcyJE5o8l6AZoyaGI4Kp vezjqS7Jg5T8aWO6+pl17EpcIxxfof9VZ9UanwazkazoQE7CToCyo5ViiPeH3fWOsjY8 kLjGGtrKF8RsJtODQQVx96GcC6h/SMFbmuja55uY3D5DAPblY/2A/o7tIbA0MvT74E/d SA/TfV7MOpT8C4K5ghmsBf+Gekf2feS4v+moLbhRNmWjFS/nSBsQl1b1np3FH5tqHOHT hhJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Wgofy5Vv/a1bJ3E66qPNrcPB7FgAlZiDZUEx9qJokpU=; b=kAAph6uCGC4njqa3mYSGkGPdVXs9B6GVhEaKQsp5+p2CEb8ufS2YhSU3UIKD3CFkfN 452/fFa2AFzVNQicBSML1M6brNBIkDDcIJhPmvrk98eSiqdC22X8dhTstBZGIBbQgxeW V7iYUWQFXEy7nWsfoPR8ACxtfg1nTjpNz7CZQSuxJdXjvdMtnsbypSxsEQuCmTD6PM8M EfpxH62W9CH46C53sa5M6e1FhhIhPh04w0Jhre5gFUUTl0b+dUzbk00W+2GEN2YEuAhL xGYZYUe3U//hacyqRFQYWJpCOyF8eZq8qk1WBxvFknpQN/NSyP9GhX2RsJ7aaG15TX3g ERJA== X-Gm-Message-State: AIkVDXKoaxo8vVrTMJ2c2K1Kv9Sk4M2f9KlSxIpLFIpe7HcTAWkm36ou3zhKEAHJuGSgCGFM13x38wWm5sMgcQ== X-Received: by 10.31.107.13 with SMTP id g13mr2156987vkc.64.1481936908926; Fri, 16 Dec 2016 17:08:28 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.2.140 with HTTP; Fri, 16 Dec 2016 17:08:28 -0800 (PST) From: Aleksandr Miroslav Date: Fri, 16 Dec 2016 17:08:28 -0800 Message-ID: Subject: Re: zfs (zxfer) replication -- "holes" in backups? To: freebsd-questions@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Dec 2016 01:08:30 -0000 On Tue, Dec 13, 2016 at 3:44 PM, Aleksandr Miroslav wrote: > I'm using zxfer to replicate my ZFS snapshot to another host. Occasionally, > for whatever reason, zxfer can't replicate a particular snapshot. > > What I find is that later when zxfer tries again, it skips that snapshot it > couldn't replicate and sends a newer one. This leaves the back up server > with with a "hole", i.e. a missing snapshot. So I think I may have found the reason for this problem I'm seeing... I'm using a pkg called zfstools to take my snapshots. It takes snapshots called frequent, hourly, daily, weekly, and monthly. The frequency of the last 4 you can probably guess. For "frequent", I take it every 15 minutes -- but not at the top of the hour, that's taken care of by the hourly snapshot. My cron looks like this: 15,30,45 * * * * root zfs-auto-snapshot frequent 4 0 * * * * root zfs-auto-snapshot hourly 24 0 0 * * * root zfs-auto-snapshot daily 31 0 0 * * 7 root zfs-auto-snapshot weekly 4 0 0 1 * * root zfs-auto-snapshot monthly 48 The number after the name of the snapshot is how many copies I keep around of that particular snapshot. You can problem see the problem right away: while I take an hourly snapshot every hour, and keep 24 copies of it (so that each hourly snapshot lives for 24 hours), I am only keeping the frequent snapshots for 4 copies. This means that each frequent snapshot only lives about 75 or 90 minutes max before it is deleted. Since my replication runs about every hour to my primary replica, and every 4 hours to another replica, and since the replication takes some time to run, it happens that a particular frequent snapshot could be marked for transfer at the start of the replication, but deleted before it can be transfered. (To be fair, I believe I had seen some errors from cron to this effect.) I believe the solution is to increase the number of frequent copies that are kept, such that each replication run can transfer all the frequent snapshots that it sees. I will increase this number and see if this fixes the problem. I will fix the already created holes manually as well.