Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Sep 2022 09:48:31 -0400
From:      Paul Mather <paul@gromit.dlib.vt.edu>
To:        "freebsd-questions@freebsd.org" <freebsd-questions@FreeBSD.org>
Subject:   Re: zfs replication tool
Message-ID:  <3FD7D1F5-F37E-4B48-A67B-DAE9DBDD5DEA@gromit.dlib.vt.edu>
In-Reply-To: <20220920122029.ufsoyo47qnxtmcqk@x1>
References:  <20220916133046.znfelln3fisrjnuz@x1> <d952d824-bcab-cfef-1b95-a8e71388c588@sentex.net> <20220916134918.hz6glg3nfwr3ouu4@x1> <0a0ba81b-88f2-fa75-9abe-6f41da5d2c69@sentex.net> <20220916140236.jeizzganrtnsrhlo@x1> <20220920092905.3k7qzt7lvhywhcfn@x1> <20220920122029.ufsoyo47qnxtmcqk@x1>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On Sep 20, 2022, at 8:20 AM, Julien Cigar <julien@perdition.city =
<mailto:julien@perdition.city>> wrote:

> On Tue, Sep 20, 2022 at 11:29:05AM +0200, Julien Cigar wrote:
>> On Fri, Sep 16, 2022 at 04:02:36PM +0200, Julien Cigar wrote:
>>> On Fri, Sep 16, 2022 at 09:56:36AM -0400, mike tancsa wrote:
>>>> On 9/16/2022 9:49 AM, Julien Cigar wrote:
>>>>> sysutils/zrepl works really well for me.
>>>>>> Check out the filter syntax to see if it meets your requirements
>>>>>>=20
>>>>>> https://zrepl.github.io/configuration/filter_syntax.html =
<https://zrepl.github.io/configuration/filter_syntax.html>;
>>>>>>=20
>>>>>>     ---Mike
>>>>> thanks, I used zrepl in the past and I experienced some deadlocks =
and
>>>>> crashes which I why I switched to sanoid (which doesn't support
>>>>> recursivity without zfs snapshot -r)
>>>>=20
>>>> Those deadlocks / crashes (if they are the ones I was thinking =
about) were
>>>> FreeBSD bugs in the end
>>>>=20
>>>> =
https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd56=
1a16ac54fdd8 =
<https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd5=
61a16ac54fdd8>
>>>>=20
>>>> https://github.com/zrepl/zrepl/issues/411#issuecomment-821878812 =
<https://github.com/zrepl/zrepl/issues/411#issuecomment-821878812>;
>>>>=20
>>>> Its been rock solid for me since those commits / fixes
>>>=20
>>> ok, I'll give zrepl another chance :) thanks for pointing this!
>>=20
>> it looks like zrepl snapshots aren't atomic across datasets too. I'm
>> testing on a local "test" machine and it gives me =
https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d7b =
<https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d7b>;
>=20
> also the thing I don't like with zrepl is that snapshot management and
> replication are tightly coupled. It looks like replicating a host "A" =
to
> "B" and "C" (classical local and off-site backup) is not possible
> without dirty hacks and race conditions ...


I like zrepl on the whole but it has some annoying quirks and =
limitations currently that, although I use it for daily replications, =
make me wish these issues could be addressed:

1) Although you can specify a snapshot prefix for pruning purposes, =
zrepl selects datasets for replication. I discovered that all snapshots =
on those datasets are replicated, not just the ones you want stewarded =
by zrepl.  In my case, I also use Tivoli TSM (now Spectrum Protect) to =
back up a system, and make a snapshot (for consistency), which is backed =
up.  (The snapshot is deleted after the backup finishes.)  I found that =
zrepl runs were picking up this ephemeral snapshot during the pull job =
and then getting into a tumult (with PLANNING-ERRORs) when this snapshot =
disappeared.  My "solution" for now is to run my pull job hourly via =
cron instead of zrepl's inbuilt timer and to have cron not run the job =
during the time window of the backup (so it won't pick up the TSM =
snapshot).  My retention is such that zrepl can "catch up" for the =
period it misses, replicating before those snapshots would be pruned.

This problem is related to this zrepl issue: =
https://github.com/zrepl/zrepl/issues/403 =
<https://github.com/zrepl/zrepl/issues/403>, opened in late 2020 and =
still not resolved.

2) Related to 1) above, replicated boot environments cause problems when =
I delete them (which is usually after I've successfully upgraded).  It =
leaves a dangling snapshot hold on the receiver side, which I need to =
clean up manually.

Maybe I'm not understanding or configuring zrepl correctly, but it does =
seem from Issue #403 that zrepl's promiscuous replication of all =
snapshots is indeed a thing and can lead to problems.

Cheers,

Paul.=

--Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D"">On =
Sep 20, 2022, at 8:20 AM, Julien Cigar &lt;<a =
href=3D"mailto:julien@perdition.city" =
class=3D"">julien@perdition.city</a>&gt; wrote:<br class=3D""><div =
class=3D""><br class=3D""><blockquote type=3D"cite" class=3D""><span =
class=3D"" style=3D"font-family: Menlo-Regular;">On Tue, Sep 20, 2022 at =
11:29:05AM +0200, Julien Cigar wrote:</span><br class=3D""><div =
class=3D""><blockquote type=3D"cite" class=3D"" style=3D"font-family: =
Menlo-Regular;">On Fri, Sep 16, 2022 at 04:02:36PM +0200, Julien Cigar =
wrote:<br class=3D""><blockquote type=3D"cite" class=3D"">On Fri, Sep =
16, 2022 at 09:56:36AM -0400, mike tancsa wrote:<br class=3D""><blockquote=
 type=3D"cite" class=3D"">On 9/16/2022 9:49 AM, Julien Cigar wrote:<br =
class=3D""><blockquote type=3D"cite" class=3D"">sysutils/zrepl works =
really well for me.<br class=3D""><blockquote type=3D"cite" =
class=3D"">Check out the filter syntax to see if it meets your =
requirements<br class=3D""><br class=3D""><a =
href=3D"https://zrepl.github.io/configuration/filter_syntax.html" =
class=3D"">https://zrepl.github.io/configuration/filter_syntax.html</a><br=
 class=3D""><br class=3D"">&nbsp;&nbsp;&nbsp;&nbsp;---Mike<br =
class=3D""></blockquote>thanks, I used zrepl in the past and I =
experienced some deadlocks and<br class=3D"">crashes which I why I =
switched to sanoid (which doesn't support<br class=3D"">recursivity =
without zfs snapshot -r)<br class=3D""></blockquote><br class=3D"">Those =
deadlocks / crashes (if they are the ones I was thinking about) were<br =
class=3D"">FreeBSD bugs in the end<br class=3D""><br class=3D""><a =
href=3D"https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce=
5a5fdd561a16ac54fdd8" =
class=3D"">https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f2=
7ce5a5fdd561a16ac54fdd8</a><br class=3D""><br class=3D""><a =
href=3D"https://github.com/zrepl/zrepl/issues/411#issuecomment-821878812" =
class=3D"">https://github.com/zrepl/zrepl/issues/411#issuecomment-82187881=
2</a><br class=3D""><br class=3D"">Its been rock solid for me since =
those commits / fixes<br class=3D""></blockquote><br class=3D"">ok, I'll =
give zrepl another chance :) thanks for pointing this!<br =
class=3D""></blockquote><br class=3D"">it looks like zrepl snapshots =
aren't atomic across datasets too. I'm<br class=3D"">testing on a local =
"test" machine and it gives me&nbsp;<a =
href=3D"https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d7b"=
 =
class=3D"">https://gist.github.com/silenius/b8aaf68dae5c941397df44184cd33d=
7b</a><br class=3D""></blockquote><br class=3D"" style=3D"font-family: =
Menlo-Regular;"><span class=3D"" style=3D"font-family: Menlo-Regular; =
float: none; display: inline !important;">also the thing I don't like =
with zrepl is that snapshot management and</span><br class=3D"" =
style=3D"font-family: Menlo-Regular;"><span class=3D"" =
style=3D"font-family: Menlo-Regular; float: none; display: inline =
!important;">replication are tightly coupled. It looks like replicating =
a host "A" to</span><br class=3D"" style=3D"font-family: =
Menlo-Regular;"><span class=3D"" style=3D"font-family: Menlo-Regular; =
float: none; display: inline !important;">"B" and "C" (classical local =
and off-site backup) is not possible</span><br class=3D"" =
style=3D"font-family: Menlo-Regular;"><span class=3D"" =
style=3D"font-family: Menlo-Regular; float: none; display: inline =
!important;">without dirty hacks and race conditions =
...</span></div></blockquote><br class=3D""></div><div class=3D""><br =
class=3D""></div><div class=3D"">I like zrepl on the whole but it has =
some annoying quirks and limitations currently that, although I use it =
for daily replications, make me wish these issues could be =
addressed:</div><div class=3D""><br class=3D""></div><div class=3D"">1) =
Although you can specify a snapshot prefix for pruning purposes, zrepl =
selects datasets for replication. I discovered that all snapshots on =
those datasets are replicated, not just the ones you want stewarded by =
zrepl. &nbsp;In my case, I also use Tivoli TSM (now Spectrum Protect) to =
back up a system, and make a snapshot (for consistency), which is backed =
up. &nbsp;(The snapshot is deleted after the backup finishes.) &nbsp;I =
found that zrepl runs were picking up this ephemeral snapshot during the =
pull job and then getting into a tumult (with PLANNING-ERRORs) when this =
snapshot disappeared. &nbsp;My "solution" for now is to run my pull job =
hourly via cron instead of zrepl's inbuilt timer and to have cron not =
run the job during the time window of the backup (so it won't pick up =
the TSM snapshot). &nbsp;My retention is such that zrepl can "catch up" =
for the period it misses, replicating before those snapshots would be =
pruned.</div><div class=3D""><br class=3D""></div><div class=3D"">This =
problem is related to this zrepl issue:&nbsp;<a =
href=3D"https://github.com/zrepl/zrepl/issues/403" =
class=3D"">https://github.com/zrepl/zrepl/issues/403</a>, opened in late =
2020 and still not resolved.</div><div class=3D""><br =
class=3D""></div><div class=3D"">2) Related to 1) above, replicated boot =
environments cause problems when I delete them (which is usually after =
I've successfully upgraded). &nbsp;It leaves a dangling snapshot hold on =
the receiver side, which I need to clean up manually.</div><br =
class=3D""><div class=3D"">Maybe I'm not understanding or configuring =
zrepl correctly, but it does seem from Issue #403 that zrepl's =
promiscuous replication of all snapshots is indeed a thing and can lead =
to problems.</div><div class=3D""><br class=3D""></div><div =
class=3D"">Cheers,</div><div class=3D""><br class=3D""></div><div =
class=3D"">Paul.</div></body></html>=

--Apple-Mail=_468AA620-6D37-40A4-97A2-B74BCA0EE682--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FD7D1F5-F37E-4B48-A67B-DAE9DBDD5DEA>