Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Apr 2023 09:18:44 -0700
From:      Cy Schubert <Cy.Schubert@cschubert.com>
To:        Martin Matuska <mm@FreeBSD.org>
Cc:        Rick Macklem <rick.macklem@gmail.com>, Mateusz Guzik <mjguzik@gmail.com>, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: 8ee579abe09e - main - zfs: fall back if block_cloning feature is disabled
Message-ID:  <20230404091844.639cb1c1@slippy>
In-Reply-To: <98c71e6f-5b48-79f3-e7b0-95d674949624@FreeBSD.org>
References:  <202304041145.334Bjx6l035872@gitrepo.freebsd.org> <20230404141717.B976D31C@slippy.cwsent.com> <CAGudoHEvGDUQkYe8LwUXgTZZa%2B6DAFXVtspCX-Mn2egDO2oc_w@mail.gmail.com> <CAM5tNy6sPx4xE%2BcAeeC_RQG_tba_K6Yh-Cni0%2B-WxJ5SXCuO9A@mail.gmail.com> <98c71e6f-5b48-79f3-e7b0-95d674949624@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--MP_/_7hN69E4dFfh=Qg2zb=GRdp
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Tue, 4 Apr 2023 17:30:25 +0200
Martin Matuska <mm@FreeBSD.org> wrote:

> So I am now a little bit confused - what is the consensus? :-)

My exmh email client made a mess of that. Let's try this again.

Rick has posted a patch. Your patch should also be incorporated to work=20
around other EXDEV errors, but a few lines earlier so it is protected by=20
the lock.

There were a couple of typos in Rick's patch (a missing keystroke;=20
s/ojset/objset/).

The patch (Rick's null pointer dereference fix, Rick's copy file range=20
patch plus your copy file range patch) builds fine on amd64 and i386.=20
Installing and testing it now.

A combination of all three patches is attached. It's compile tested but is=
=20
currently being installed and will be tested when install is completed.

--=20
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e^(i*pi)+1=3D0

>=20
> On 4. 4. 2023 17:26, Rick Macklem wrote:
> > On Tue, Apr 4, 2023 at 7:38=E2=80=AFAM Mateusz Guzik <mjguzik@gmail.com=
> wrote: =20
> >> CAUTION: This email originated from outside of the University of Guelp=
h. Do not click links or open attachments unless you recognize the sender a=
nd know the content is safe. If in doubt, forward suspicious emails to IThe=
lp@uoguelph.ca
> >>
> >>
> >> On 4/4/23, Cy Schubert <Cy.Schubert@cschubert.com> wrote: =20
> >>> In message <202304041145.334Bjx6l035872@gitrepo.freebsd.org>, Martin
> >>> Matuska wr
> >>> ites: =20
> >>>> The branch main has been updated by mm:
> >>>>
> >>>> URL:
> >>>> https://cgit.FreeBSD.org/src/commit/?id=3D8ee579abe09ec1fe15c588fc9a=
08370b
> >>>> 83b81cd6
> >>>>
> >>>> commit 8ee579abe09ec1fe15c588fc9a08370b83b81cd6
> >>>> Author:     Martin Matuska <mm@FreeBSD.org>
> >>>> AuthorDate: 2023-04-04 11:40:41 +0000
> >>>> Commit:     Martin Matuska <mm@FreeBSD.org>
> >>>> CommitDate: 2023-04-04 11:43:34 +0000
> >>>>
> >>>>      zfs: fall back if block_cloning feature is disabled
> >>>>
> >>>>      If block_cloning is disabled, or other errors from zfs_clone_ra=
nge()
> >>>>      return an EXDEV we should fall back to vn_generic_copy_file_ran=
ge().
> >>>>
> >>>>      This fixes issues when copying files on the same dataset with
> >>>>      block_cloning disabled.
> >>>>
> >>>>      Upstreamed as pull request to OpenZFS.
> >>>>
> >>>>      Reviewed by:    Mateusz Guzik <mjguzik@gmail.com>
> >>>>      OpenZFS pull request:   14713
> >>>> ---
> >>>>   .../openzfs/module/os/freebsd/zfs/zfs_vnops_os.c        | 17
> >>>> ++++++++++-----
> >>>> --
> >>>>   1 file changed, 10 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >>>> b/sys/c
> >>>> ontrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >>>> index 97429b360a36..2cd1d27e37bc 100644
> >>>> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >>>> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >>>> @@ -6243,13 +6243,6 @@ zfs_freebsd_copy_file_range(struct
> >>>> vop_copy_file_range
> >>>> _args *ap)
> >>>>       int error;
> >>>>       uint64_t len =3D *ap->a_lenp;
> >>>>
> >>>> -    /*
> >>>> -     * TODO: If offset/length is not aligned to recordsize, use
> >>>> -     * vn_generic_copy_file_range() on this fragment.
> >>>> -     * It would be better to do this after we lock the vnodes, but =
then we
> >>>> -     * need something else than vn_generic_copy_file_range().
> >>>> -     */
> >>>> -
> >>>>       /* Lock both vnodes, avoiding risk of deadlock. */
> >>>>       do {
> >>>>               mp =3D NULL;
> >>>> @@ -6300,6 +6293,16 @@ unlock:
> >>>>       if (mp !=3D NULL)
> >>>>               vn_finished_write(mp);
> >>>>
> >>>> +    /*
> >>>> +     * Fall back if block_cloning feature is disabled
> >>>> +     * or other EXDEV failures from zfs_vnops.c
> >>>> +     */
> >>>> +    if (error =3D=3D EXDEV) {
> >>>> +            error =3D vn_generic_copy_file_range(ap->a_invp, ap->a_=
inoffp,
> >>>> +                        ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap-=
>a_flags
> >>>> ,
> >>>> +                        ap->a_incred, ap->a_outcred, ap->a_fsizetd);
> >>>> +    }
> >>>> +
> >>>>       return (error);
> >>>>   }
> >>>>
> >>>> =20
> >>> This is too late to fall back. On Rick's suggestion the following mak=
es the
> >>>
> >>> determination at
> >>> zfs_freebsd_copy_file_range() entry much earlier.
> >>> =20
> >> It's not too late, but I agree it is faster to bail out early.
> >>
> >> The proposed patch adds a condition which *differs* from the one in
> >> zfs_clone_range:
> >>          if (dmu_objset_spa(inos) !=3D dmu_objset_spa(outos)) {
> >>                  zfs_exit_two(inzfsvfs, outzfsvfs, FTAG);
> >>                  return (SET_ERROR(EXDEV));
> >>          }
> >>
> >> ... meaning with the proposed patch the routine can still fail with
> >> EXDEV, making zfs_freebsd_copy_file_range also do it, which must not
> >> happen. =20
> > Since VOP_COPY_FILE_RANGE() is only called when invp and outvp
> > are on the same mount point, I don't think this can happen now.
> > However, there is a TO DO comment that suggests a call with invp and
> > outvp on different mount points may be in the future.
> >
> > As such, leaving Martin's patch in so that it calls vn_generic_copy_fil=
e_range()
> > when zfs_clone_range() returns EXDEV seems like a good idea to me.
> > =20
> >> That aside the code looks rather suspicious for the case where target
> >> and source vnode are the same. iow more work is needed here. =20
> > Definitely needs to be tested. I'll do that later to-day.
> >
> > rick
> > =20
> >> As the vnode is unlocked, you *can't* safely access zfsvfs_t
> >> *outzfsvfs =3D ZTOZSB(outzp); in that spot in this manner -- a forced
> >> unmount at the same time can free it.
> >>
> >> iow this patch does *NOT* work.
> >>
> >> With the committed variant the situation is damage controlled enough
> >> that there is time to sort it out correctly.
> >> =20
> >>> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >>> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >>> index d41821ff67f1..e18dcca58192 100644
> >>> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >>> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >>> @@ -6243,6 +6243,18 @@ zfs_freebsd_copy_file_range(struct
> >>> vop_copy_file_range_args *ap)
> >>>        int error;
> >>>        uint64_t len =3D *ap->a_lenp;
> >>>
> >>> +     znode_t *outzp =3D VTOZ(ap->a_outvp);
> >>> +     zfsvfs_t *outzfsvfs =3D ZTOZSB(outzp);
> >>> +     objset_t *outos =3D outzfsvfs->z_os;
> >>> +
> >>> +        if (!spa_feature_is_enabled(dmu_objset_spa(outos),
> >>> +            SPA_FEATURE_BLOCK_CLONING)) {
> >>> +             error =3D vn_generic_copy_file_range(ap->a_invp, ap->a_=
inoffp,
> >>> +                     ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_f=
lags,
> >>> +                     ap->a_incred, ap->a_outcred, ap->a_fsizetd);
> >>> +                return (error);
> >>> +        }
> >>> +
> >>>        /*
> >>>         * TODO: If offset/length is not aligned to recordsize, use
> >>>         * vn_generic_copy_file_range() on this fragment.
> >>>
> >>>
> >>> Can you revert your commit and commit this, please.
> >>>
> >>>
> >>> --
> >>> Cheers,
> >>> Cy Schubert <Cy.Schubert@cschubert.com>
> >>> FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
> >>> NTP:           <cy@nwtime.org>    Web:  https://nwtime.org
> >>>
> >>>                        e^(i*pi)+1=3D0
> >>>
> >>>
> >>>
> >>> =20
> >>
> >> --
> >> Mateusz Guzik <mjguzik gmail.com> =20


--MP_/_7hN69E4dFfh=Qg2zb=GRdp
Content-Type: text/x-patch
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename=zfs-jumbo.patch

diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
index 97429b360a36..16e0176be2ff 100644
--- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
+++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
@@ -6242,6 +6242,30 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
 	struct uio io;
 	int error;
 	uint64_t len = *ap->a_lenp;
+	zfsvfs_t *outzfsvfs;
+	objset_t *outos;
+	bool done_outvp;
+
+	mp = NULL;
+	error = vn_start_write(outvp, &mp, V_WAIT);
+	if (error == 0)
+		error = vn_lock(outvp, LK_EXCLUSIVE);
+	done_outvp = true;
+	if (error == 0) {
+		outzfsvfs = ZTOZSB(VTOZ(outvp));
+		outos = outzfsvfs->z_os;
+		if (!spa_feature_is_enabled(dmu_objset_spa(outos),
+		    SPA_FEATURE_BLOCK_CLONING)) {
+			VOP_UNLOCK(outvp);
+			if (mp != NULL)
+				vn_finished_write(mp);
+			error = vn_generic_copy_file_range(ap->a_invp,
+			    ap->a_inoffp, ap->a_outvp, ap->a_outoffp,
+			    ap->a_lenp, ap->a_flags, ap->a_incred,
+			    ap->a_outcred, ap->a_fsizetd);
+			return (error);
+		}
+	}
 
 	/*
 	 * TODO: If offset/length is not aligned to recordsize, use
@@ -6252,27 +6276,29 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
 
 	/* Lock both vnodes, avoiding risk of deadlock. */
 	do {
-		mp = NULL;
-		error = vn_start_write(outvp, &mp, V_WAIT);
+		if (!done_outvp) {
+			mp = NULL;
+			error = vn_start_write(outvp, &mp, V_WAIT);
+			if (error == 0)
+				error = vn_lock(outvp, LK_EXCLUSIVE);
+		}
 		if (error == 0) {
-			error = vn_lock(outvp, LK_EXCLUSIVE);
-			if (error == 0) {
-				if (invp == outvp)
-					break;
-				error = vn_lock(invp, LK_SHARED | LK_NOWAIT);
-				if (error == 0)
-					break;
-				VOP_UNLOCK(outvp);
-				if (mp != NULL)
-					vn_finished_write(mp);
-				mp = NULL;
-				error = vn_lock(invp, LK_SHARED);
-				if (error == 0)
-					VOP_UNLOCK(invp);
-			}
+			if (invp == outvp)
+				break;
+			error = vn_lock(invp, LK_SHARED | LK_NOWAIT);
+			if (error == 0)
+				break;
+			VOP_UNLOCK(outvp);
+			if (mp != NULL)
+				vn_finished_write(mp);
+			mp = NULL;
+			error = vn_lock(invp, LK_SHARED);
+			if (error == 0)
+				VOP_UNLOCK(invp);
 		}
 		if (mp != NULL)
 			vn_finished_write(mp);
+		done_outvp = false;
 	} while (error == 0);
 	if (error != 0)
 		return (error);
@@ -6290,7 +6316,12 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
 		goto unlock;
 
 	error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp),
-	    ap->a_outoffp, &len, ap->a_fsizetd->td_ucred);
+	    ap->a_outoffp, &len, ap->a_outcred);
+	if (error == EXDEV)
+		error = vn_generic_copy_file_range(ap->a_invp,
+		    ap->a_inoffp, ap->a_outvp, ap->a_outoffp,
+		    ap->a_lenp, ap->a_flags, ap->a_incred,
+		    ap->a_outcred, ap->a_fsizetd);
 	*ap->a_lenp = (size_t)len;
 
 unlock:

--MP_/_7hN69E4dFfh=Qg2zb=GRdp--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230404091844.639cb1c1>