Date: Fri, 23 Sep 2022 09:48:31 -0400 From: Paul Mather <paul@gromit.dlib.vt.edu> To: "freebsd-questions@freebsd.org" <freebsd-questions@FreeBSD.org> Subject: Re: zfs replication tool Message-ID: <3FD7D1F5-F37E-4B48-A67B-DAE9DBDD5DEA@gromit.dlib.vt.edu> In-Reply-To: <20220920122029.ufsoyo47qnxtmcqk@x1> References: <20220916133046.znfelln3fisrjnuz@x1> <d952d824-bcab-cfef-1b95-a8e71388c588@sentex.net> <20220916134918.hz6glg3nfwr3ouu4@x1> <0a0ba81b-88f2-fa75-9abe-6f41da5d2c69@sentex.net> <20220916140236.jeizzganrtnsrhlo@x1> <20220920092905.3k7qzt7lvhywhcfn@x1> <20220920122029.ufsoyo47qnxtmcqk@x1>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Sep 20, 2022, at 8:20 AM, Julien Cigar <julien@perdition.city = <mailto:julien@perdition.city>> wrote: > On Tue, Sep 20, 2022 at 11:29:05AM +0200, Julien Cigar wrote: >> On Fri, Sep 16, 2022 at 04:02:36PM +0200, Julien Cigar wrote: >>> On Fri, Sep 16, 2022 at 09:56:36AM -0400, mike tancsa wrote: >>>> On 9/16/2022 9:49 AM, Julien Cigar wrote: >>>>> sysutils/zrepl works really well for me. >>>>>> Check out the filter syntax to see if it meets your requirements >>>>>>=20 >>>>>> https://zrepl.github.io/configuration/filter_syntax.html = <https://zrepl.github.io/configuration/filter_syntax.html> >>>>>>=20 >>>>>> ---Mike >>>>> thanks, I used zrepl in the past and I experienced some deadlocks = and >>>>> crashes which I why I switched to sanoid (which doesn't support >>>>> recursivity without zfs snapshot -r) >>>>=20 >>>> Those deadlocks / crashes (if they are the ones I was thinking = about) were >>>> FreeBSD bugs in the end >>>>=20 >>>> = https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd56= 1a16ac54fdd8 = <https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd5= 61a16ac54fdd8> >>>>=20 >>>> https://github.com/zrepl/zrepl/issues/411#issuecomment-821878812 = <https://github.com/zrepl/zrepl/issues/411#issuecomment-821878812> >>>>=20 >>>> Its been rock solid for me since those commits / fixes >>>=20 >>> ok, I'll give zrepl another chance :) thanks for pointing this! >>=20 >> it looks like zrepl snapshots aren't atomic across datasets too. I'm >> testing on a local "test" machine and it gives me = https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d7b = <https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d7b> >=20 > also the thing I don't like with zrepl is that snapshot management and > replication are tightly coupled. It looks like replicating a host "A" = to > "B" and "C" (classical local and off-site backup) is not possible > without dirty hacks and race conditions ... I like zrepl on the whole but it has some annoying quirks and = limitations currently that, although I use it for daily replications, = make me wish these issues could be addressed: 1) Although you can specify a snapshot prefix for pruning purposes, = zrepl selects datasets for replication. I discovered that all snapshots = on those datasets are replicated, not just the ones you want stewarded = by zrepl. In my case, I also use Tivoli TSM (now Spectrum Protect) to = back up a system, and make a snapshot (for consistency), which is backed = up. (The snapshot is deleted after the backup finishes.) I found that = zrepl runs were picking up this ephemeral snapshot during the pull job = and then getting into a tumult (with PLANNING-ERRORs) when this snapshot = disappeared. My "solution" for now is to run my pull job hourly via = cron instead of zrepl's inbuilt timer and to have cron not run the job = during the time window of the backup (so it won't pick up the TSM = snapshot). My retention is such that zrepl can "catch up" for the = period it misses, replicating before those snapshots would be pruned. This problem is related to this zrepl issue: = https://github.com/zrepl/zrepl/issues/403 = <https://github.com/zrepl/zrepl/issues/403>, opened in late 2020 and = still not resolved. 2) Related to 1) above, replicated boot environments cause problems when = I delete them (which is usually after I've successfully upgraded). It = leaves a dangling snapshot hold on the receiver side, which I need to = clean up manually. Maybe I'm not understanding or configuring zrepl correctly, but it does = seem from Issue #403 that zrepl's promiscuous replication of all = snapshots is indeed a thing and can lead to problems. Cheers, Paul.= --Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D"">On = Sep 20, 2022, at 8:20 AM, Julien Cigar <<a = href=3D"mailto:julien@perdition.city" = class=3D"">julien@perdition.city</a>> wrote:<br class=3D""><div = class=3D""><br class=3D""><blockquote type=3D"cite" class=3D""><span = class=3D"" style=3D"font-family: Menlo-Regular;">On Tue, Sep 20, 2022 at = 11:29:05AM +0200, Julien Cigar wrote:</span><br class=3D""><div = class=3D""><blockquote type=3D"cite" class=3D"" style=3D"font-family: = Menlo-Regular;">On Fri, Sep 16, 2022 at 04:02:36PM +0200, Julien Cigar = wrote:<br class=3D""><blockquote type=3D"cite" class=3D"">On Fri, Sep = 16, 2022 at 09:56:36AM -0400, mike tancsa wrote:<br class=3D""><blockquote= type=3D"cite" class=3D"">On 9/16/2022 9:49 AM, Julien Cigar wrote:<br = class=3D""><blockquote type=3D"cite" class=3D"">sysutils/zrepl works = really well for me.<br class=3D""><blockquote type=3D"cite" = class=3D"">Check out the filter syntax to see if it meets your = requirements<br class=3D""><br class=3D""><a = href=3D"https://zrepl.github.io/configuration/filter_syntax.html" = class=3D"">https://zrepl.github.io/configuration/filter_syntax.html</a><br= class=3D""><br class=3D""> ---Mike<br = class=3D""></blockquote>thanks, I used zrepl in the past and I = experienced some deadlocks and<br class=3D"">crashes which I why I = switched to sanoid (which doesn't support<br class=3D"">recursivity = without zfs snapshot -r)<br class=3D""></blockquote><br class=3D"">Those = deadlocks / crashes (if they are the ones I was thinking about) were<br = class=3D"">FreeBSD bugs in the end<br class=3D""><br class=3D""><a = href=3D"https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce= 5a5fdd561a16ac54fdd8" = class=3D"">https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f2= 7ce5a5fdd561a16ac54fdd8</a><br class=3D""><br class=3D""><a = href=3D"https://github.com/zrepl/zrepl/issues/411#issuecomment-821878812" = class=3D"">https://github.com/zrepl/zrepl/issues/411#issuecomment-82187881= 2</a><br class=3D""><br class=3D"">Its been rock solid for me since = those commits / fixes<br class=3D""></blockquote><br class=3D"">ok, I'll = give zrepl another chance :) thanks for pointing this!<br = class=3D""></blockquote><br class=3D"">it looks like zrepl snapshots = aren't atomic across datasets too. I'm<br class=3D"">testing on a local = "test" machine and it gives me <a = href=3D"https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d7b"= = class=3D"">https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d= 7b</a><br class=3D""></blockquote><br class=3D"" style=3D"font-family: = Menlo-Regular;"><span class=3D"" style=3D"font-family: Menlo-Regular; = float: none; display: inline !important;">also the thing I don't like = with zrepl is that snapshot management and</span><br class=3D"" = style=3D"font-family: Menlo-Regular;"><span class=3D"" = style=3D"font-family: Menlo-Regular; float: none; display: inline = !important;">replication are tightly coupled. It looks like replicating = a host "A" to</span><br class=3D"" style=3D"font-family: = Menlo-Regular;"><span class=3D"" style=3D"font-family: Menlo-Regular; = float: none; display: inline !important;">"B" and "C" (classical local = and off-site backup) is not possible</span><br class=3D"" = style=3D"font-family: Menlo-Regular;"><span class=3D"" = style=3D"font-family: Menlo-Regular; float: none; display: inline = !important;">without dirty hacks and race conditions = ...</span></div></blockquote><br class=3D""></div><div class=3D""><br = class=3D""></div><div class=3D"">I like zrepl on the whole but it has = some annoying quirks and limitations currently that, although I use it = for daily replications, make me wish these issues could be = addressed:</div><div class=3D""><br class=3D""></div><div class=3D"">1) = Although you can specify a snapshot prefix for pruning purposes, zrepl = selects datasets for replication. I discovered that all snapshots on = those datasets are replicated, not just the ones you want stewarded by = zrepl. In my case, I also use Tivoli TSM (now Spectrum Protect) to = back up a system, and make a snapshot (for consistency), which is backed = up. (The snapshot is deleted after the backup finishes.) I = found that zrepl runs were picking up this ephemeral snapshot during the = pull job and then getting into a tumult (with PLANNING-ERRORs) when this = snapshot disappeared. My "solution" for now is to run my pull job = hourly via cron instead of zrepl's inbuilt timer and to have cron not = run the job during the time window of the backup (so it won't pick up = the TSM snapshot). My retention is such that zrepl can "catch up" = for the period it misses, replicating before those snapshots would be = pruned.</div><div class=3D""><br class=3D""></div><div class=3D"">This = problem is related to this zrepl issue: <a = href=3D"https://github.com/zrepl/zrepl/issues/403" = class=3D"">https://github.com/zrepl/zrepl/issues/403</a>, opened in late = 2020 and still not resolved.</div><div class=3D""><br = class=3D""></div><div class=3D"">2) Related to 1) above, replicated boot = environments cause problems when I delete them (which is usually after = I've successfully upgraded). It leaves a dangling snapshot hold on = the receiver side, which I need to clean up manually.</div><br = class=3D""><div class=3D"">Maybe I'm not understanding or configuring = zrepl correctly, but it does seem from Issue #403 that zrepl's = promiscuous replication of all snapshots is indeed a thing and can lead = to problems.</div><div class=3D""><br class=3D""></div><div = class=3D"">Cheers,</div><div class=3D""><br class=3D""></div><div = class=3D"">Paul.</div></body></html>= --Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FD7D1F5-F37E-4B48-A67B-DAE9DBDD5DEA>