Date: Thu, 17 Nov 2022 18:47:12 -0600 From: Eric Borisch <eborisch@gmail.com> To: andy thomas <andy@time-domain.co.uk> Cc: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>, Freddie Cash <fjwcash@gmail.com>, FreeBSD Filesystems <freebsd-fs@freebsd.org>, Mark Saad <nonesuch@longcount.org> Subject: Re: Odd behaviour of two identical ZFS servers mirroring via rsync Message-ID: <CAMsT2=ndjTz43bHk6w5hFn7cxnyPpm4gWgfR8f9kayiGWbRhKg@mail.gmail.com> In-Reply-To: <alpine.BSF.2.22.395.2211171422520.50255@mail0.time-domain.net> References: <alpine.BSF.2.22.395.2211111709230.29479@mail0.time-domain.net> <CAOgwaMuoLQ9Er67Y%2B=q%2Bf9724v2WT3L5v5TZaRVXq%2B=1vEyJ%2BA@mail.gmail.com> <alpine.BSF.2.22.395.2211112008220.30520@mail0.time-domain.net> <alpine.GSO.2.20.2211121949060.7126@scrappy.simplesystems.org> <CAMXt9Nbr=7K6PELVGAPZ=-RiAfx=zp9iOoKyWdH=0H2=AiE52Q@mail.gmail.com> <alpine.GSO.2.20.2211131137020.7126@scrappy.simplesystems.org> <alpine.BSF.2.22.395.2211170840040.46246@mail0.time-domain.net> <CAOjFWZ6vxeXonEDndUvLkudRsRsBAd0sJ5ssOf4gLCgkVgSeyQ@mail.gmail.com> <alpine.BSF.2.22.395.2211171422520.50255@mail0.time-domain.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000815acb05edb40c44 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Take the time to figure out send/recv; it is a killer app of ZFS. Note that your initial sync will have to send the entire filesystem; there is no way to start with an rsync-ed copy due to the nature of send/recv. Also note you cannot modify the receive side and then update the backup; as such you should typically set it to be read-only. (Otherwise you will have to roll back to the synchronized snapshot before updating). You can still recv into a read-only zfs filesystem, as the read-only is a statement of =E2=80=9Cread only through the posix layer=E2=80=9D. - Eric On Thu, Nov 17, 2022 at 8:44 AM andy thomas <andy@time-domain.co.uk> wrote: > On Thu, 17 Nov 2022, Freddie Cash wrote: > > > Now that you have it working with rsync, you should look into using ZFS > > send/recv as an alternative. You should find it finishes a lot quicker > than > > rsync, although it does require a bit more scripting know-how > (especially if > > you want to use restartable/interruptible transfers, or use a transport > > other than SSH for better throughout). > > ZFS send/recv works "below" the filesystem later today rsync works at. > ZFS > > knows which individual blocks on disk have changed between snapshots an= d > > only transfers those blocks. There's no file comparisons and hash > > computations to work out between the hosts. > > > > Transferring the initial snapshot takes a long time, though, as it has = to > > transfer the entire filesystem across. Transferring individual snapshot= s > > after that takes very little time. It's similar to doing a "full" backu= p, > > and then "incrementals". > > > > When transferring data between ZFS pools with similar filesystem > > hierarchies, you really should consider send/recv. > > Point taken! Three days ago, one of our HPC users who has ~9TB of data > stored on our server decided to rename a subdirectory containing ~4TB of > experimental data stored as many millions of relatively small files withi= n > a lot of subdirectories. As a result, rsync on the destination (mirror) > server is still deleting his old folder and its contents and hasn't even > started mirroring the renamed folder. > > Since our servers have been up for 5.5 years and are both well overdue fo= r > an O/S upgrade from FBSD 11.3 to 13.x anyway, I think this would be a goo= d > opportunity to switch from rsync to ZFS send/recv. I was planning to do > the O/S update over the upcoming Christmas vacation when HPC demand here > traditionally falls to a very low level - I will set up a pair of test > servers in the next day or two, play around with this and get some > experience of this before upgrading the 'live' servers. > > cheers, Andy > > > Typos due to smartphone keyboard. > > > > On Thu., Nov. 17, 2022, 12:50 a.m. andy thomas, <andy@time-domain.co.uk= > > > wrote: > > I thought I would report back that changed my rsync options from > > '-Wav > > --delete' to '-av --inplace --no-whole-file --delete' has made a > > significant difference, with mirrored directory sizes on the > > slave server > > now falling and approaching the original sizes on the master. > > The only > > downside is that since whole-file replication is obviously a lot > > faster > > than updating the changed parts of individual files, mirroring > > is now > > taking longer than 24 hours so this will be changed to every few > > days or > > even weekly when more is known about user behaviour on the > > master server. > > > > Andy > > > > On Sun, 13 Nov 2022, Bob Friesenhahn wrote: > > > > > On Sun, 13 Nov 2022, Mark Saad wrote: > > >>> > > >> Bob are you saying when the target is zfs --inplace > > --no-whole-file helps > > >> or just in general when you have > > >> large files ? Also have you tried using --delete-during / > > --delete-after > > >> ? > > > > > > The '-inplace --no-whole-file' updates the file blocks if they > > have changed > > > (comparing the orgin blocks with the existing mirror blocks) > > rather than > > > creating a new copy of the file and moving it into place when > > it is complete. > > > ZFS does not check if data content has been changed while it > > is being written > > > so a write of the same data will result in a fresh allocation > > based on its > > > Copy On Write ("COW") design. Writing a whole new file > > obviously > > > significantly increases the number of blocks which are > > written. Requesting > > > that rsync only write to the file for the blocks which have > > changed reduces > > > the total number of blocks which get written. > > > > > > The above helps quite a lot when using snapshots since then > > fewer blocks are > > > in the snapshots. > > > > > > I have never tried --delete-during so I can't comment on that. > > > > > > Bob > > > -- > > > Bob Friesenhahn > > > bfriesen@simple.dallas.tx.us, > > http://www.simplesystems.org/users/bfriesen/ > > > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > > Public Key, > > http://www.simplesystems.org/users/bfriesen/public-key.txt > > > > > > > > > > --000000000000815acb05edb40c44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div><div dir=3D"auto">Take the time to figure out send/recv; it is a kille= r app of ZFS. Note that your initial sync will have to send the entire file= system; there is no way to start with an rsync-ed copy due to the nature of= send/recv.</div><div dir=3D"auto"><br></div><div dir=3D"auto">Also note yo= u cannot modify the receive side and then update the backup; as such you sh= ould typically set it to be read-only. (Otherwise you will have to roll bac= k to the synchronized snapshot before updating). You can still recv into a = read-only zfs filesystem, as the read-only is a statement of =C2=A0=E2=80= =9Cread only through the posix layer=E2=80=9D.</div><div dir=3D"auto"><br><= /div><div dir=3D"auto">=C2=A0- Eric</div><div><br><div class=3D"gmail_quote= "><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Nov 17, 2022 at 8:44 AM and= y thomas <<a href=3D"mailto:andy@time-domain.co.uk" target=3D"_blank">an= dy@time-domain.co.uk</a>> wrote:<br></div><blockquote class=3D"gmail_quo= te" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-sty= le:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">On Thu, 17 No= v 2022, Freddie Cash wrote:<br> <br> > Now that you have it working with rsync, you should look into using ZF= S<br> > send/recv as an alternative. You should find it finishes a lot quicker= than<br> > rsync, although it does require a bit more scripting know-how (especia= lly if<br> > you want to use restartable/interruptible transfers, or use a transpor= t<br> > other than SSH for better throughout).<br> > ZFS send/recv works "below" the filesystem later today rsync= works at. ZFS<br> > knows which individual blocks on disk have changed between snapshots a= nd<br> > only transfers those blocks. There's no file comparisons and hash<= br> > computations to work out between the hosts.<br> > <br> > Transferring the initial snapshot takes a long time, though, as it has= to<br> > transfer the entire filesystem across. Transferring individual snapsho= ts<br> > after that takes very little time. It's similar to doing a "f= ull" backup,<br> > and then "incrementals".<br> > <br> > When transferring data between ZFS pools with similar filesystem<br> > hierarchies, you really should consider send/recv.<br> <br> Point taken! Three days ago, one of our HPC users who has ~9TB of data <br> stored on our server decided to rename a subdirectory containing ~4TB of <b= r> experimental data stored as many millions of relatively small files within = <br> a lot of subdirectories. As a result, rsync on the destination (mirror) <br= > server is still deleting his old folder and its contents and hasn't eve= n <br> started mirroring the renamed folder.<br> <br> Since our servers have been up for 5.5 years and are both well overdue for = <br> an O/S upgrade from FBSD 11.3 to 13.x anyway, I think this would be a good = <br> opportunity to switch from rsync to ZFS send/recv. I was planning to do <br= > the O/S update over the upcoming Christmas vacation when HPC demand here <b= r> traditionally falls to a very low level - I will set up a pair of test <br> servers in the next day or two, play around with this and get some <br> experience of this before upgrading the 'live' servers.<br> <br> cheers, Andy<br> <br> > Typos due to smartphone keyboard.<br> > <br> > On Thu., Nov. 17, 2022, 12:50 a.m. andy thomas, <<a href=3D"mailto:= andy@time-domain.co.uk" target=3D"_blank">andy@time-domain.co.uk</a>><br= > > wrote:<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0I thought I would report back that changed m= y rsync options from<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0'-Wav<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0--delete' to '-av --inplace --no-who= le-file --delete' has made a<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0significant difference, with mirrored direct= ory sizes on the<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0slave server<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0now falling and approaching the original siz= es on the master.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0The only<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0downside is that since whole-file replicatio= n is obviously a lot<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0faster<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0than updating the changed parts of individua= l files, mirroring<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0is now<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0taking longer than 24 hours so this will be = changed to every few<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0days or<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0even weekly when more is known about user be= haviour on the<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0master server.<br> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0Andy<br> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0On Sun, 13 Nov 2022, Bob Friesenhahn wrote:<= br> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> On Sun, 13 Nov 2022, Mark Saad wrote:<b= r> >=C2=A0 =C2=A0 =C2=A0 =C2=A0>>><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0>> Bob are you saying when the target = is zfs --inplace<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0--no-whole-file helps<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0>> or just in general when you have<br= > >=C2=A0 =C2=A0 =C2=A0 =C2=A0>> large files ?=C2=A0 Also have you t= ried using --delete-during /<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0--delete-after<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0>> ?<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> The '-inplace --no-whole-file' = updates the file blocks if they<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0have changed<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> (comparing the orgin blocks with the ex= isting mirror blocks)<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0rather than<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> creating a new copy of the file and mov= ing it into place when<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0it is complete.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> ZFS does not check if data content has = been changed while it<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0is being written<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> so a write of the same data will result= in a fresh allocation<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0based on its<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> Copy On Write ("COW") design.= =C2=A0 Writing a whole new file<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0obviously<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> significantly increases the number of b= locks which are<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0written.=C2=A0 Requesting<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> that rsync only write to the file for t= he blocks which have<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0changed reduces<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> the total number of blocks which get wr= itten.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> The above helps quite a lot when using = snapshots since then<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0fewer blocks are<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> in the snapshots.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> I have never tried --delete-during so I= can't comment on that.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> Bob<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> --<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> Bob Friesenhahn<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> <a href=3D"mailto:bfriesen@simple.dalla= s.tx.us" target=3D"_blank">bfriesen@simple.dallas.tx.us</a>,<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.simplesystems.org/user= s/bfriesen/" rel=3D"noreferrer" target=3D"_blank">http://www.simplesystems.= org/users/bfriesen/</a><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> GraphicsMagick Maintainer,=C2=A0 =C2=A0= <a href=3D"http://www.GraphicsMagick.org/" rel=3D"noreferrer" target=3D"_b= lank">http://www.GraphicsMagick.org/</a><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0> Public Key,=C2=A0 =C2=A0<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0<a href=3D"http://www.simplesystems.or= g/users/bfriesen/public-key.txt" rel=3D"noreferrer" target=3D"_blank">http:= //www.simplesystems.org/users/bfriesen/public-key.txt</a><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0><br> > <br> ></blockquote></div></div> </div> --000000000000815acb05edb40c44--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMsT2=ndjTz43bHk6w5hFn7cxnyPpm4gWgfR8f9kayiGWbRhKg>