From nobody Fri Sep 23 13:48:31 2022 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MYtlj0018z4d3lx for ; Fri, 23 Sep 2022 13:48:32 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.126.123]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4MYtlh3X7sz443l for ; Fri, 23 Sep 2022 13:48:32 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from smtpclient.apple (unknown [IPv6:2001:470:e15b:23::23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gromit.dlib.vt.edu (Postfix) with ESMTPSA id 039ED799AA for ; Fri, 23 Sep 2022 09:48:31 -0400 (EDT) From: Paul Mather Content-Type: multipart/alternative; boundary="Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682" List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Subject: Re: zfs replication tool Date: Fri, 23 Sep 2022 09:48:31 -0400 References: <20220916133046.znfelln3fisrjnuz@x1> <20220916134918.hz6glg3nfwr3ouu4@x1> <0a0ba81b-88f2-fa75-9abe-6f41da5d2c69@sentex.net> <20220916140236.jeizzganrtnsrhlo@x1> <20220920092905.3k7qzt7lvhywhcfn@x1> <20220920122029.ufsoyo47qnxtmcqk@x1> To: "freebsd-questions@freebsd.org" In-Reply-To: <20220920122029.ufsoyo47qnxtmcqk@x1> Message-Id: <3FD7D1F5-F37E-4B48-A67B-DAE9DBDD5DEA@gromit.dlib.vt.edu> X-Mailer: Apple Mail (2.3696.120.41.1.1) X-Rspamd-Queue-Id: 4MYtlh3X7sz443l X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=vt.edu (policy=none); spf=none (mx1.freebsd.org: domain of paul@gromit.dlib.vt.edu has no SPF policy when checking 128.173.126.123) smtp.mailfrom=paul@gromit.dlib.vt.edu X-Spamd-Result: default: False [-1.50 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; URI_COUNT_ODD(1.00)[25]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[vt.edu : No valid SPF, No valid DKIM,none]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:1312, ipnet:128.173.0.0/16, country:US]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-questions@FreeBSD.org]; MID_RHS_MATCH_FROM(0.00)[]; FREEFALL_USER(0.00)[paul]; FROM_HAS_DN(0.00)[]; TO_DN_EQ_ADDR_ALL(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Sep 20, 2022, at 8:20 AM, Julien Cigar > wrote: > On Tue, Sep 20, 2022 at 11:29:05AM +0200, Julien Cigar wrote: >> On Fri, Sep 16, 2022 at 04:02:36PM +0200, Julien Cigar wrote: >>> On Fri, Sep 16, 2022 at 09:56:36AM -0400, mike tancsa wrote: >>>> On 9/16/2022 9:49 AM, Julien Cigar wrote: >>>>> sysutils/zrepl works really well for me. >>>>>> Check out the filter syntax to see if it meets your requirements >>>>>>=20 >>>>>> https://zrepl.github.io/configuration/filter_syntax.html = >>>>>>=20 >>>>>> ---Mike >>>>> thanks, I used zrepl in the past and I experienced some deadlocks = and >>>>> crashes which I why I switched to sanoid (which doesn't support >>>>> recursivity without zfs snapshot -r) >>>>=20 >>>> Those deadlocks / crashes (if they are the ones I was thinking = about) were >>>> FreeBSD bugs in the end >>>>=20 >>>> = https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd56= 1a16ac54fdd8 = >>>>=20 >>>> https://github.com/zrepl/zrepl/issues/411#issuecomment-821878812 = >>>>=20 >>>> Its been rock solid for me since those commits / fixes >>>=20 >>> ok, I'll give zrepl another chance :) thanks for pointing this! >>=20 >> it looks like zrepl snapshots aren't atomic across datasets too. I'm >> testing on a local "test" machine and it gives me = https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d7b = >=20 > also the thing I don't like with zrepl is that snapshot management and > replication are tightly coupled. It looks like replicating a host "A" = to > "B" and "C" (classical local and off-site backup) is not possible > without dirty hacks and race conditions ... I like zrepl on the whole but it has some annoying quirks and = limitations currently that, although I use it for daily replications, = make me wish these issues could be addressed: 1) Although you can specify a snapshot prefix for pruning purposes, = zrepl selects datasets for replication. I discovered that all snapshots = on those datasets are replicated, not just the ones you want stewarded = by zrepl. In my case, I also use Tivoli TSM (now Spectrum Protect) to = back up a system, and make a snapshot (for consistency), which is backed = up. (The snapshot is deleted after the backup finishes.) I found that = zrepl runs were picking up this ephemeral snapshot during the pull job = and then getting into a tumult (with PLANNING-ERRORs) when this snapshot = disappeared. My "solution" for now is to run my pull job hourly via = cron instead of zrepl's inbuilt timer and to have cron not run the job = during the time window of the backup (so it won't pick up the TSM = snapshot). My retention is such that zrepl can "catch up" for the = period it misses, replicating before those snapshots would be pruned. This problem is related to this zrepl issue: = https://github.com/zrepl/zrepl/issues/403 = , opened in late 2020 and = still not resolved. 2) Related to 1) above, replicated boot environments cause problems when = I delete them (which is usually after I've successfully upgraded). It = leaves a dangling snapshot hold on the receiver side, which I need to = clean up manually. Maybe I'm not understanding or configuring zrepl correctly, but it does = seem from Issue #403 that zrepl's promiscuous replication of all = snapshots is indeed a thing and can lead to problems. Cheers, Paul.= --Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii On = Sep 20, 2022, at 8:20 AM, Julien Cigar <julien@perdition.city> wrote:

On Tue, Sep 20, 2022 at = 11:29:05AM +0200, Julien Cigar wrote:
On Fri, Sep 16, 2022 at 04:02:36PM +0200, Julien Cigar = wrote:
On Fri, Sep = 16, 2022 at 09:56:36AM -0400, mike tancsa wrote:
On 9/16/2022 9:49 AM, Julien Cigar wrote:
sysutils/zrepl works = really well for me.
Check out the filter syntax to see if it meets your = requirements

https://zrepl.github.io/configuration/filter_syntax.html
    ---Mike
thanks, I used zrepl in the past and I = experienced some deadlocks and
crashes which I why I = switched to sanoid (which doesn't support
recursivity = without zfs snapshot -r)

Those = deadlocks / crashes (if they are the ones I was thinking about) were
FreeBSD bugs in the end

https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f2= 7ce5a5fdd561a16ac54fdd8

https://github.com/zrepl/zrepl/issues/411#issuecomment-82187881= 2

Its been rock solid for me since = those commits / fixes

ok, I'll = give zrepl another chance :) thanks for pointing this!

it looks like zrepl snapshots = aren't atomic across datasets too. I'm
testing on a local = "test" machine and it gives me https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d= 7b

also the thing I don't like = with zrepl is that snapshot management and
replication are tightly coupled. It looks like replicating = a host "A" to
"B" and "C" (classical local = and off-site backup) is not possible
without dirty hacks and race conditions = ...


I like zrepl on the whole but it has = some annoying quirks and limitations currently that, although I use it = for daily replications, make me wish these issues could be = addressed:

1) = Although you can specify a snapshot prefix for pruning purposes, zrepl = selects datasets for replication. I discovered that all snapshots on = those datasets are replicated, not just the ones you want stewarded by = zrepl.  In my case, I also use Tivoli TSM (now Spectrum Protect) to = back up a system, and make a snapshot (for consistency), which is backed = up.  (The snapshot is deleted after the backup finishes.)  I = found that zrepl runs were picking up this ephemeral snapshot during the = pull job and then getting into a tumult (with PLANNING-ERRORs) when this = snapshot disappeared.  My "solution" for now is to run my pull job = hourly via cron instead of zrepl's inbuilt timer and to have cron not = run the job during the time window of the backup (so it won't pick up = the TSM snapshot).  My retention is such that zrepl can "catch up" = for the period it misses, replicating before those snapshots would be = pruned.

This = problem is related to this zrepl issue: https://github.com/zrepl/zrepl/issues/403, opened in late = 2020 and still not resolved.

2) Related to 1) above, replicated boot = environments cause problems when I delete them (which is usually after = I've successfully upgraded).  It leaves a dangling snapshot hold on = the receiver side, which I need to clean up manually.

Maybe I'm not understanding or configuring = zrepl correctly, but it does seem from Issue #403 that zrepl's = promiscuous replication of all snapshots is indeed a thing and can lead = to problems.

Cheers,

Paul.
= --Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682--