Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Nov 2022 18:47:12 -0600
From:      Eric Borisch <eborisch@gmail.com>
To:        andy thomas <andy@time-domain.co.uk>
Cc:        Bob Friesenhahn <bfriesen@simple.dallas.tx.us>, Freddie Cash <fjwcash@gmail.com>,  FreeBSD Filesystems <freebsd-fs@freebsd.org>, Mark Saad <nonesuch@longcount.org>
Subject:   Re: Odd behaviour of two identical ZFS servers mirroring via rsync
Message-ID:  <CAMsT2=ndjTz43bHk6w5hFn7cxnyPpm4gWgfR8f9kayiGWbRhKg@mail.gmail.com>
In-Reply-To: <alpine.BSF.2.22.395.2211171422520.50255@mail0.time-domain.net>
References:  <alpine.BSF.2.22.395.2211111709230.29479@mail0.time-domain.net> <CAOgwaMuoLQ9Er67Y%2B=q%2Bf9724v2WT3L5v5TZaRVXq%2B=1vEyJ%2BA@mail.gmail.com> <alpine.BSF.2.22.395.2211112008220.30520@mail0.time-domain.net> <alpine.GSO.2.20.2211121949060.7126@scrappy.simplesystems.org> <CAMXt9Nbr=7K6PELVGAPZ=-RiAfx=zp9iOoKyWdH=0H2=AiE52Q@mail.gmail.com> <alpine.GSO.2.20.2211131137020.7126@scrappy.simplesystems.org> <alpine.BSF.2.22.395.2211170840040.46246@mail0.time-domain.net> <CAOjFWZ6vxeXonEDndUvLkudRsRsBAd0sJ5ssOf4gLCgkVgSeyQ@mail.gmail.com> <alpine.BSF.2.22.395.2211171422520.50255@mail0.time-domain.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000815acb05edb40c44
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Take the time to figure out send/recv; it is a killer app of ZFS. Note that
your initial sync will have to send the entire filesystem; there is no way
to start with an rsync-ed copy due to the nature of send/recv.

Also note you cannot modify the receive side and then update the backup; as
such you should typically set it to be read-only. (Otherwise you will have
to roll back to the synchronized snapshot before updating). You can still
recv into a read-only zfs filesystem, as the read-only is a statement of
 =E2=80=9Cread only through the posix layer=E2=80=9D.

 - Eric

On Thu, Nov 17, 2022 at 8:44 AM andy thomas <andy@time-domain.co.uk> wrote:

> On Thu, 17 Nov 2022, Freddie Cash wrote:
>
> > Now that you have it working with rsync, you should look into using ZFS
> > send/recv as an alternative. You should find it finishes a lot quicker
> than
> > rsync, although it does require a bit more scripting know-how
> (especially if
> > you want to use restartable/interruptible transfers, or use a transport
> > other than SSH for better throughout).
> > ZFS send/recv works "below" the filesystem later today rsync works at.
> ZFS
> > knows which individual blocks on disk have changed between snapshots an=
d
> > only transfers those blocks. There's no file comparisons and hash
> > computations to work out between the hosts.
> >
> > Transferring the initial snapshot takes a long time, though, as it has =
to
> > transfer the entire filesystem across. Transferring individual snapshot=
s
> > after that takes very little time. It's similar to doing a "full" backu=
p,
> > and then "incrementals".
> >
> > When transferring data between ZFS pools with similar filesystem
> > hierarchies, you really should consider send/recv.
>
> Point taken! Three days ago, one of our HPC users who has ~9TB of data
> stored on our server decided to rename a subdirectory containing ~4TB of
> experimental data stored as many millions of relatively small files withi=
n
> a lot of subdirectories. As a result, rsync on the destination (mirror)
> server is still deleting his old folder and its contents and hasn't even
> started mirroring the renamed folder.
>
> Since our servers have been up for 5.5 years and are both well overdue fo=
r
> an O/S upgrade from FBSD 11.3 to 13.x anyway, I think this would be a goo=
d
> opportunity to switch from rsync to ZFS send/recv. I was planning to do
> the O/S update over the upcoming Christmas vacation when HPC demand here
> traditionally falls to a very low level - I will set up a pair of test
> servers in the next day or two, play around with this and get some
> experience of this before upgrading the 'live' servers.
>
> cheers, Andy
>
> > Typos due to smartphone keyboard.
> >
> > On Thu., Nov. 17, 2022, 12:50 a.m. andy thomas, <andy@time-domain.co.uk=
>
> > wrote:
> >       I thought I would report back that changed my rsync options from
> >       '-Wav
> >       --delete' to '-av --inplace --no-whole-file --delete' has made a
> >       significant difference, with mirrored directory sizes on the
> >       slave server
> >       now falling and approaching the original sizes on the master.
> >       The only
> >       downside is that since whole-file replication is obviously a lot
> >       faster
> >       than updating the changed parts of individual files, mirroring
> >       is now
> >       taking longer than 24 hours so this will be changed to every few
> >       days or
> >       even weekly when more is known about user behaviour on the
> >       master server.
> >
> >       Andy
> >
> >       On Sun, 13 Nov 2022, Bob Friesenhahn wrote:
> >
> >       > On Sun, 13 Nov 2022, Mark Saad wrote:
> >       >>>
> >       >> Bob are you saying when the target is zfs --inplace
> >       --no-whole-file helps
> >       >> or just in general when you have
> >       >> large files ?  Also have you tried using --delete-during /
> >       --delete-after
> >       >> ?
> >       >
> >       > The '-inplace --no-whole-file' updates the file blocks if they
> >       have changed
> >       > (comparing the orgin blocks with the existing mirror blocks)
> >       rather than
> >       > creating a new copy of the file and moving it into place when
> >       it is complete.
> >       > ZFS does not check if data content has been changed while it
> >       is being written
> >       > so a write of the same data will result in a fresh allocation
> >       based on its
> >       > Copy On Write ("COW") design.  Writing a whole new file
> >       obviously
> >       > significantly increases the number of blocks which are
> >       written.  Requesting
> >       > that rsync only write to the file for the blocks which have
> >       changed reduces
> >       > the total number of blocks which get written.
> >       >
> >       > The above helps quite a lot when using snapshots since then
> >       fewer blocks are
> >       > in the snapshots.
> >       >
> >       > I have never tried --delete-during so I can't comment on that.
> >       >
> >       > Bob
> >       > --
> >       > Bob Friesenhahn
> >       > bfriesen@simple.dallas.tx.us,
> >       http://www.simplesystems.org/users/bfriesen/
> >       > GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> >       > Public Key,
> >        http://www.simplesystems.org/users/bfriesen/public-key.txt
> >       >
> >       >
> >
> >

--000000000000815acb05edb40c44
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div><div dir=3D"auto">Take the time to figure out send/recv; it is a kille=
r app of ZFS. Note that your initial sync will have to send the entire file=
system; there is no way to start with an rsync-ed copy due to the nature of=
 send/recv.</div><div dir=3D"auto"><br></div><div dir=3D"auto">Also note yo=
u cannot modify the receive side and then update the backup; as such you sh=
ould typically set it to be read-only. (Otherwise you will have to roll bac=
k to the synchronized snapshot before updating). You can still recv into a =
read-only zfs filesystem, as the read-only is a statement of =C2=A0=E2=80=
=9Cread only through the posix layer=E2=80=9D.</div><div dir=3D"auto"><br><=
/div><div dir=3D"auto">=C2=A0- Eric</div><div><br><div class=3D"gmail_quote=
"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Nov 17, 2022 at 8:44 AM and=
y thomas &lt;<a href=3D"mailto:andy@time-domain.co.uk" target=3D"_blank">an=
dy@time-domain.co.uk</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-sty=
le:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">On Thu, 17 No=
v 2022, Freddie Cash wrote:<br>
<br>
&gt; Now that you have it working with rsync, you should look into using ZF=
S<br>
&gt; send/recv as an alternative. You should find it finishes a lot quicker=
 than<br>
&gt; rsync, although it does require a bit more scripting know-how (especia=
lly if<br>
&gt; you want to use restartable/interruptible transfers, or use a transpor=
t<br>
&gt; other than SSH for better throughout).<br>
&gt; ZFS send/recv works &quot;below&quot; the filesystem later today rsync=
 works at. ZFS<br>
&gt; knows which individual blocks on disk have changed between snapshots a=
nd<br>
&gt; only transfers those blocks. There&#39;s no file comparisons and hash<=
br>
&gt; computations to work out between the hosts.<br>
&gt; <br>
&gt; Transferring the initial snapshot takes a long time, though, as it has=
 to<br>
&gt; transfer the entire filesystem across. Transferring individual snapsho=
ts<br>
&gt; after that takes very little time. It&#39;s similar to doing a &quot;f=
ull&quot; backup,<br>
&gt; and then &quot;incrementals&quot;.<br>
&gt; <br>
&gt; When transferring data between ZFS pools with similar filesystem<br>
&gt; hierarchies, you really should consider send/recv.<br>
<br>
Point taken! Three days ago, one of our HPC users who has ~9TB of data <br>
stored on our server decided to rename a subdirectory containing ~4TB of <b=
r>
experimental data stored as many millions of relatively small files within =
<br>
a lot of subdirectories. As a result, rsync on the destination (mirror) <br=
>
server is still deleting his old folder and its contents and hasn&#39;t eve=
n <br>
started mirroring the renamed folder.<br>
<br>
Since our servers have been up for 5.5 years and are both well overdue for =
<br>
an O/S upgrade from FBSD 11.3 to 13.x anyway, I think this would be a good =
<br>
opportunity to switch from rsync to ZFS send/recv. I was planning to do <br=
>
the O/S update over the upcoming Christmas vacation when HPC demand here <b=
r>
traditionally falls to a very low level - I will set up a pair of test <br>
servers in the next day or two, play around with this and get some <br>
experience of this before upgrading the &#39;live&#39; servers.<br>
<br>
cheers, Andy<br>
<br>
&gt; Typos due to smartphone keyboard.<br>
&gt; <br>
&gt; On Thu., Nov. 17, 2022, 12:50 a.m. andy thomas, &lt;<a href=3D"mailto:=
andy@time-domain.co.uk" target=3D"_blank">andy@time-domain.co.uk</a>&gt;<br=
>
&gt; wrote:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0I thought I would report back that changed m=
y rsync options from<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&#39;-Wav<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0--delete&#39; to &#39;-av --inplace --no-who=
le-file --delete&#39; has made a<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0significant difference, with mirrored direct=
ory sizes on the<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0slave server<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0now falling and approaching the original siz=
es on the master.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0The only<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0downside is that since whole-file replicatio=
n is obviously a lot<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0faster<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0than updating the changed parts of individua=
l files, mirroring<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0is now<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0taking longer than 24 hours so this will be =
changed to every few<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0days or<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0even weekly when more is known about user be=
haviour on the<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0master server.<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0Andy<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0On Sun, 13 Nov 2022, Bob Friesenhahn wrote:<=
br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; On Sun, 13 Nov 2022, Mark Saad wrote:<b=
r>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;&gt;&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;&gt; Bob are you saying when the target =
is zfs --inplace<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0--no-whole-file helps<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;&gt; or just in general when you have<br=
>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;&gt; large files ?=C2=A0 Also have you t=
ried using --delete-during /<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0--delete-after<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;&gt; ?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; The &#39;-inplace --no-whole-file&#39; =
updates the file blocks if they<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0have changed<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; (comparing the orgin blocks with the ex=
isting mirror blocks)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0rather than<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; creating a new copy of the file and mov=
ing it into place when<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0it is complete.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; ZFS does not check if data content has =
been changed while it<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0is being written<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; so a write of the same data will result=
 in a fresh allocation<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0based on its<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; Copy On Write (&quot;COW&quot;) design.=
=C2=A0 Writing a whole new file<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0obviously<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; significantly increases the number of b=
locks which are<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0written.=C2=A0 Requesting<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; that rsync only write to the file for t=
he blocks which have<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0changed reduces<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; the total number of blocks which get wr=
itten.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; The above helps quite a lot when using =
snapshots since then<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0fewer blocks are<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; in the snapshots.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; I have never tried --delete-during so I=
 can&#39;t comment on that.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; Bob<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; --<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; Bob Friesenhahn<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; <a href=3D"mailto:bfriesen@simple.dalla=
s.tx.us" target=3D"_blank">bfriesen@simple.dallas.tx.us</a>,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.simplesystems.org/user=
s/bfriesen/" rel=3D"noreferrer" target=3D"_blank">http://www.simplesystems.=
org/users/bfriesen/</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; GraphicsMagick Maintainer,=C2=A0 =C2=A0=
 <a href=3D"http://www.GraphicsMagick.org/" rel=3D"noreferrer" target=3D"_b=
lank">http://www.GraphicsMagick.org/</a><br>;
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; Public Key,=C2=A0 =C2=A0<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0<a href=3D"http://www.simplesystems.or=
g/users/bfriesen/public-key.txt" rel=3D"noreferrer" target=3D"_blank">http:=
//www.simplesystems.org/users/bfriesen/public-key.txt</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt;<br>
&gt; <br>
&gt;</blockquote></div></div>
</div>

--000000000000815acb05edb40c44--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMsT2=ndjTz43bHk6w5hFn7cxnyPpm4gWgfR8f9kayiGWbRhKg>