Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Aug 2025 11:22:41 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Rick Macklem <rick.macklem@gmail.com>
Cc:        Alexander Motin <mav@freebsd.org>, FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: RFC: Does ZFS block cloning do this?
Message-ID:  <CAOtMX2ha6pR7=zZKs9PttetRORA6OT1ywNFLVXVGvQ1hUH1OgA@mail.gmail.com>
In-Reply-To: <CAM5tNy7JfRry0%2BPkz-xFQuiXGfq0hTxVjBW4SZWc8Fy1=PJhqQ@mail.gmail.com>
References:  <CAM5tNy7V7Btem%2ByWNK7oyn9qsk6TrQwuGo1kxqhCstLM4_uh9g@mail.gmail.com> <CAOtMX2jGcQY_AywWv1tVBbAk%2BrOheya%2BHRQBMRDc7ELGrA7qNA@mail.gmail.com> <CAM5tNy6PJbTnjf24L0Y9j5NicBTZHDBKp%2BaF-VhOLCsaY5Qbnw@mail.gmail.com> <8925b735-8398-4e0f-95f7-8d1115413013@FreeBSD.org> <CAM5tNy5HcgovNrB52zZ4W2p6Ur7VBX9ZZkX74Y4rkef%2B2evt0Q@mail.gmail.com> <CAM5tNy7JfRry0%2BPkz-xFQuiXGfq0hTxVjBW4SZWc8Fy1=PJhqQ@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Thu, Aug 7, 2025 at 8:32 AM Rick Macklem <rick.macklem@gmail.com> wrote:

> On Wed, Aug 6, 2025 at 9:46 AM Rick Macklem <rick.macklem@gmail.com>
> wrote:
> >
> > On Wed, Aug 6, 2025 at 9:28 AM Alexander Motin <mav@freebsd.org> wrote:
> > >
> > > Hi Rick,
> > >
> > > On 8/6/25 11:54, Rick Macklem wrote:
> > > > The difference for NFSv4.2 is that CLONE cannot return with partial
> completion.
> > > > (It assumes that a CLONE of any size will complete quickly enough
> for an RPC.
> > > > Although there is no fixed limit, most assume an RPC reply should
> happen in
> > > > 1-2sec at most. For COPY, the server can return with only part of the
> > > > copy done.)
> > > > It also includes alignment restrictions for the byte offsets.
> > > >
> > > > There is also the alignment restriction on CLONE. There doesn't seem
> to be
> > > > an alignment restriction on zfs_clone_range(), but maybe it is
> buried inside it?
> > > > I think adding yet another pathconf name to get the alignment
> requirement and
> > > > whether or not the file system supports it would work without any
> VOP change.
> > >
> > > The semantics you describe looks similar to Linux FICLONE/FICLONERANGE
> > > calls, that got adopted there before copy_file_range().  IIRC those
> > > effectively mean -- clone the file or its range as requested or fail.
> I
> > > am not sure why some people prefer those calls, explicitly not allowing
> > > fallback to copy, but theere are some, for example Veeam backup fails
> if
> > > ZFS rejects the cloning request for any reason.  For Linux ZFS has a
> > > separate code (see zpl_remap_file_range() and respective VFS calls)
> > > wrapping around block cloning to implement this semantics.  FreeBSD
> does
> > > not have the equivalent at this point, but it would be trivial to add,
> > > if we really need those VOPs.
> > For NFSv4.2 (which I suspect was modelled after what Linux does) the
> > difference is the ability to complete the entire "copy" within 1-2sec
> under
> > normal circumstances.
> > --> The NFSv4.2 CLONE operation requires this.
> > whereas for the NFSv4.2 COPY
> > --> It is allowed to return after a partial completion to adhere to the
> 1-2sec
> >       rule. This probably does not affect ZFS, but it is needed for
> > the "in general"
> >       UFS case.
> >
> > There may be no difference needed for zfs_copy_file_range(). So long as
> it
> > never returns after a partial completion. If it does return after
> > partial completion,
> > a flag would indicate "must complete it".
> >
> > As for FreeBSD syscalls, I don't see a need for a new one.
> > I'll leave that up to others.
> > pathconf(2) could be used to determine if cloning is supported.
> >
> > Thanks for all the comments. It looks like a new "kernel only" flag for
> > VOP_COPY_FILE_RANGE() and a new name for VOP_PATHCONF()
> > should be all that is needed.
> So, this seems almost too easy?
>
> What I am thinking of (and should be easy to do in the next few days
> for 15.0) is:
> - Define a new pathconf variable _PC_CLONE_BLKSIZE which returns
>   the blksize for cloning or 0 if cloning is not supported.
> - Define a new flag for copy_file_range() called COPY_FILE_RANGE_CLONE
>   which, if set, would require that the entire copy be completed via
> cloning
>   (no partial copy allowed) or return ENOSYS if the file system does not
>   support this.
>   Expose this flag to userland in case any application really needs
> cloning.
> The code changes outside of NFS are trivial.
>
> So, how does this sound? ric


Yes, I think that would work.

[-- Attachment #2 --]
<div dir="ltr"><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Aug 7, 2025 at 8:32 AM Rick Macklem &lt;<a href="mailto:rick.macklem@gmail.com">rick.macklem@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, Aug 6, 2025 at 9:46 AM Rick Macklem &lt;<a href="mailto:rick.macklem@gmail.com" target="_blank">rick.macklem@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; On Wed, Aug 6, 2025 at 9:28 AM Alexander Motin &lt;<a href="mailto:mav@freebsd.org" target="_blank">mav@freebsd.org</a>&gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt; Hi Rick,<br>
&gt; &gt;<br>
&gt; &gt; On 8/6/25 11:54, Rick Macklem wrote:<br>
&gt; &gt; &gt; The difference for NFSv4.2 is that CLONE cannot return with partial completion.<br>
&gt; &gt; &gt; (It assumes that a CLONE of any size will complete quickly enough for an RPC.<br>
&gt; &gt; &gt; Although there is no fixed limit, most assume an RPC reply should happen in<br>
&gt; &gt; &gt; 1-2sec at most. For COPY, the server can return with only part of the<br>
&gt; &gt; &gt; copy done.)<br>
&gt; &gt; &gt; It also includes alignment restrictions for the byte offsets.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; There is also the alignment restriction on CLONE. There doesn&#39;t seem to be<br>
&gt; &gt; &gt; an alignment restriction on zfs_clone_range(), but maybe it is buried inside it?<br>
&gt; &gt; &gt; I think adding yet another pathconf name to get the alignment requirement and<br>
&gt; &gt; &gt; whether or not the file system supports it would work without any VOP change.<br>
&gt; &gt;<br>
&gt; &gt; The semantics you describe looks similar to Linux FICLONE/FICLONERANGE<br>
&gt; &gt; calls, that got adopted there before copy_file_range().  IIRC those<br>
&gt; &gt; effectively mean -- clone the file or its range as requested or fail.  I<br>
&gt; &gt; am not sure why some people prefer those calls, explicitly not allowing<br>
&gt; &gt; fallback to copy, but theere are some, for example Veeam backup fails if<br>
&gt; &gt; ZFS rejects the cloning request for any reason.  For Linux ZFS has a<br>
&gt; &gt; separate code (see zpl_remap_file_range() and respective VFS calls)<br>
&gt; &gt; wrapping around block cloning to implement this semantics.  FreeBSD does<br>
&gt; &gt; not have the equivalent at this point, but it would be trivial to add,<br>
&gt; &gt; if we really need those VOPs.<br>
&gt; For NFSv4.2 (which I suspect was modelled after what Linux does) the<br>
&gt; difference is the ability to complete the entire &quot;copy&quot; within 1-2sec under<br>
&gt; normal circumstances.<br>
&gt; --&gt; The NFSv4.2 CLONE operation requires this.<br>
&gt; whereas for the NFSv4.2 COPY<br>
&gt; --&gt; It is allowed to return after a partial completion to adhere to the 1-2sec<br>
&gt;       rule. This probably does not affect ZFS, but it is needed for<br>
&gt; the &quot;in general&quot;<br>
&gt;       UFS case.<br>
&gt;<br>
&gt; There may be no difference needed for zfs_copy_file_range(). So long as it<br>
&gt; never returns after a partial completion. If it does return after<br>
&gt; partial completion,<br>
&gt; a flag would indicate &quot;must complete it&quot;.<br>
&gt;<br>
&gt; As for FreeBSD syscalls, I don&#39;t see a need for a new one.<br>
&gt; I&#39;ll leave that up to others.<br>
&gt; pathconf(2) could be used to determine if cloning is supported.<br>
&gt;<br>
&gt; Thanks for all the comments. It looks like a new &quot;kernel only&quot; flag for<br>
&gt; VOP_COPY_FILE_RANGE() and a new name for VOP_PATHCONF()<br>
&gt; should be all that is needed.<br>
So, this seems almost too easy?<br>
<br>
What I am thinking of (and should be easy to do in the next few days<br>
for 15.0) is:<br>
- Define a new pathconf variable _PC_CLONE_BLKSIZE which returns<br>
  the blksize for cloning or 0 if cloning is not supported.<br>
- Define a new flag for copy_file_range() called COPY_FILE_RANGE_CLONE<br>
  which, if set, would require that the entire copy be completed via cloning<br>
  (no partial copy allowed) or return ENOSYS if the file system does not<br>
  support this.<br>
  Expose this flag to userland in case any application really needs cloning.<br>
The code changes outside of NFS are trivial.<br>
<br>
So, how does this sound? ric</blockquote><div><br></div><div>Yes, I think that would work. </div></div></div>
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2ha6pR7=zZKs9PttetRORA6OT1ywNFLVXVGvQ1hUH1OgA>