Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Apr 2023 08:33:12 -0700
From:      Rick Macklem <rick.macklem@gmail.com>
To:        Cy Schubert <Cy.Schubert@cschubert.com>
Cc:        Mateusz Guzik <mjguzik@gmail.com>, Martin Matuska <mm@freebsd.org>, src-committers@freebsd.org,  dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: 8ee579abe09e - main - zfs: fall back if block_cloning feature is disabled
Message-ID:  <CAM5tNy7g2UwXo6dJi7PGM=NTg58UV3VpMFHgdPjVRVsuVCA64w@mail.gmail.com>
In-Reply-To: <20230404150206.0A0512A7@slippy.cwsent.com>
References:  <202304041145.334Bjx6l035872@gitrepo.freebsd.org> <20230404141717.B976D31C@slippy.cwsent.com> <CAGudoHEvGDUQkYe8LwUXgTZZa%2B6DAFXVtspCX-Mn2egDO2oc_w@mail.gmail.com> <20230404150206.0A0512A7@slippy.cwsent.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 4, 2023 at 8:02=E2=80=AFAM Cy Schubert <Cy.Schubert@cschubert.c=
om> wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. =
Do not click links or open attachments unless you recognize the sender and =
know the content is safe. If in doubt, forward suspicious emails to IThelp@=
uoguelph.ca
>
>
> In message <CAGudoHEvGDUQkYe8LwUXgTZZa+6DAFXVtspCX-Mn2egDO2oc_w@mail.gmai=
l.c
> om>
> , Mateusz Guzik writes:
> > On 4/4/23, Cy Schubert <Cy.Schubert@cschubert.com> wrote:
> > > In message <202304041145.334Bjx6l035872@gitrepo.freebsd.org>, Martin
> > > Matuska wr
> > > ites:
> > >> The branch main has been updated by mm:
> > >>
> > >> URL:
> > >> https://cgit.FreeBSD.org/src/commit/?id=3D8ee579abe09ec1fe15c588fc9a=
08370b
> > >> 83b81cd6
> > >>
> > >> commit 8ee579abe09ec1fe15c588fc9a08370b83b81cd6
> > >> Author:     Martin Matuska <mm@FreeBSD.org>
> > >> AuthorDate: 2023-04-04 11:40:41 +0000
> > >> Commit:     Martin Matuska <mm@FreeBSD.org>
> > >> CommitDate: 2023-04-04 11:43:34 +0000
> > >>
> > >>     zfs: fall back if block_cloning feature is disabled
> > >>
> > >>     If block_cloning is disabled, or other errors from zfs_clone_ran=
ge()
> > >>     return an EXDEV we should fall back to vn_generic_copy_file_rang=
e().
> > >>
> > >>     This fixes issues when copying files on the same dataset with
> > >>     block_cloning disabled.
> > >>
> > >>     Upstreamed as pull request to OpenZFS.
> > >>
> > >>     Reviewed by:    Mateusz Guzik <mjguzik@gmail.com>
> > >>     OpenZFS pull request:   14713
> > >> ---
> > >>  .../openzfs/module/os/freebsd/zfs/zfs_vnops_os.c        | 17
> > >> ++++++++++-----
> > >> --
> > >>  1 file changed, 10 insertions(+), 7 deletions(-)
> > >>
> > >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.=
c
> > >> b/sys/c
> > >> ontrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > >> index 97429b360a36..2cd1d27e37bc 100644
> > >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > >> @@ -6243,13 +6243,6 @@ zfs_freebsd_copy_file_range(struct
> > >> vop_copy_file_range
> > >> _args *ap)
> > >>    int error;
> > >>    uint64_t len =3D *ap->a_lenp;
> > >>
> > >> -  /*
> > >> -   * TODO: If offset/length is not aligned to recordsize, use
> > >> -   * vn_generic_copy_file_range() on this fragment.
> > >> -   * It would be better to do this after we lock the vnodes, but th=
en we
> > >> -   * need something else than vn_generic_copy_file_range().
> > >> -   */
> > >> -
> > >>    /* Lock both vnodes, avoiding risk of deadlock. */
> > >>    do {
> > >>            mp =3D NULL;
> > >> @@ -6300,6 +6293,16 @@ unlock:
> > >>    if (mp !=3D NULL)
> > >>            vn_finished_write(mp);
> > >>
> > >> +  /*
> > >> +   * Fall back if block_cloning feature is disabled
> > >> +   * or other EXDEV failures from zfs_vnops.c
> > >> +   */
> > >> +  if (error =3D=3D EXDEV) {
> > >> +          error =3D vn_generic_copy_file_range(ap->a_invp, ap->a_in=
offp,
> > >> +                      ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a=
_flags
> > >> ,
> > >> +                      ap->a_incred, ap->a_outcred, ap->a_fsizetd);
> > >> +  }
> > >> +
> > >>    return (error);
> > >>  }
> > >>
> > >>
> > >
> > > This is too late to fall back. On Rick's suggestion the following mak=
es the
> > >
> > > determination at
> > > zfs_freebsd_copy_file_range() entry much earlier.
> > >
> >
> > It's not too late, but I agree it is faster to bail out early.
> >
> > The proposed patch adds a condition which *differs* from the one in
> > zfs_clone_range:
> >         if (dmu_objset_spa(inos) !=3D dmu_objset_spa(outos)) {
> >                 zfs_exit_two(inzfsvfs, outzfsvfs, FTAG);
> >                 return (SET_ERROR(EXDEV));
> >         }
> >
> > ... meaning with the proposed patch the routine can still fail with
> > EXDEV, making zfs_freebsd_copy_file_range also do it, which must not
> > happen.
> >
> > That aside the code looks rather suspicious for the case where target
> > and source vnode are the same. iow more work is needed here.
> >
> > As the vnode is unlocked, you *can't* safely access zfsvfs_t
> > *outzfsvfs =3D ZTOZSB(outzp); in that spot in this manner -- a forced
> > unmount at the same time can free it.
> >
> > iow this patch does *NOT* work.
> >
> > With the committed variant the situation is damage controlled enough
> > that there is time to sort it out correctly.
> >
> > > diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > > b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> [...]
>
> Gotcha. What you're suggesting is something more like this. Check for
> block_cloning and also retry should zfs_clone_range() return EXDEV for an=
y
> other reason.
>
> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> index baa2ee5b3824..60916bfcfbc3 100644
> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> @@ -6239,6 +6239,9 @@ zfs_freebsd_copy_file_range(struct
> vop_copy_file_range_args *ap)
>         struct vnode *invp =3D ap->a_invp;
>         struct vnode *outvp =3D ap->a_outvp;
>         struct mount *mp;
> +       znode_t *outzp;
> +       zfsvfs_t *outzfsvfs;
> +       objset_t *outos;
>         struct uio io;
>         int error;
>         uint64_t len =3D *ap->a_lenp;
> @@ -6276,6 +6279,19 @@ zfs_freebsd_copy_file_range(struct
> vop_copy_file_range_args *ap)
>         } while (error =3D=3D 0);
>         if (error !=3D 0)
>                 return (error);
> +
> +       outzp =3D VTOZ(ap->a_outvp);
> +       outzfsvfs =3D ZTOZSB(outzp);
> +       outos =3D outzfsvfs->z_os;
> +
> +        if (!spa_feature_is_enabled(dmu_objset_spa(outos),
> +            SPA_FEATURE_BLOCK_CLONING)) {
> +               error =3D vn_generic_copy_file_range(ap->a_invp, ap->a_in=
offp,
> +                       ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_fla=
gs,
> +                       ap->a_incred, ap->a_outcred, ap->a_fsizetd);
> +                goto unlock;
> +        }
The trouble with doing it here is that the code has already gone through
all the vnode locking arm waving. It seems that it is just as easy (and avo=
ids
duplicating the test) to call zfs_clone_range() and let it do the test.

I'd rather do this test after only locking outvp, which avoids looping when
invp cannot be locked without waiting. (This loop can burn up a lot of cpu
when invp is exclusively locked by something else.)

The other tests cannot be done until both vnodes are locked, but I'm not
sure duplicating the code to check them here instead of just letting
zfs_clone_range() do them gains much?

rick

> +
>  #ifdef MAC
>         error =3D mac_vnode_check_write(curthread->td_ucred, ap->a_outcre=
d,
>             outvp);
> @@ -6291,6 +6307,11 @@ zfs_freebsd_copy_file_range(struct
> vop_copy_file_range_args *ap)
>
>         error =3D zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp),
>             ap->a_outoffp, &len, ap->a_outcred);
> +
> +       if (error =3D=3D EXDEV)
> +               error =3D vn_generic_copy_file_range(ap->a_invp, ap->a_in=
offp,
> +                       ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_fla=
gs,
> +                       ap->a_incred, ap->a_outcred, ap->a_fsizetd);
>         *ap->a_lenp =3D (size_t)len;
>
>  unlock:
>
>
> --
> Cheers,
> Cy Schubert <Cy.Schubert@cschubert.com>
> FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
> NTP:           <cy@nwtime.org>    Web:  https://nwtime.org
>
>                         e^(i*pi)+1=3D0
>
>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy7g2UwXo6dJi7PGM=NTg58UV3VpMFHgdPjVRVsuVCA64w>