Date: Tue, 27 Aug 2013 18:02:57 -0700 From: Xin Li <delphij@delphij.net> To: Davide Italiano <davide@freebsd.org> Cc: src-committers@freebsd.org, Xin LI <d@delphij.net>, svn-src-all@freebsd.org, Andriy Gapon <avg@freebsd.org>, Xin LI <delphij@freebsd.org>, svn-src-head@freebsd.org, Konstantin Belousov <kib@freebsd.org> Subject: Re: svn commit: r254585 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs Message-ID: <521D4C41.3060208@delphij.net> In-Reply-To: <CACYV=-G0i3%2BU3ubWnPSGQpEvKsuJShYJZOMvKQnjBA1aLftJsA@mail.gmail.com> References: <201308202231.r7KMVERi068300@svn.freebsd.org> <20130825221517.GM24767@caravan.chchile.org> <521B75CE.70103@FreeBSD.org> <521BDEAC.9080909@delphij.net> <521C5CAC.2060400@FreeBSD.org> <CACYV=-G0i3%2BU3ubWnPSGQpEvKsuJShYJZOMvKQnjBA1aLftJsA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 08/27/13 01:59, Davide Italiano wrote: > On Tue, Aug 27, 2013 at 10:00 AM, Andriy Gapon <avg@freebsd.org> > wrote: >> on 27/08/2013 02:03 Xin Li said the following: >>> On 08/26/13 08:35, Andriy Gapon wrote: >>>> on 26/08/2013 01:15 Jeremie Le Hen said the following: >>>>> Hi Xin, >>>>> >>>>> On Tue, Aug 20, 2013 at 10:31:14PM +0000, Xin LI wrote: >> [snip] >>>>>> @@ zfs_rename(vnode_t *sdvp, char *snm, vno if >>>>>> (VOP_REALVP(tdvp, &realvp, ct) == 0) tdvp = realvp; >>>>>> >>>>>> - if (tdvp->v_vfsp != sdvp->v_vfsp || >>>>>> zfsctl_is_node(tdvp)) { + tdzp = VTOZ(tdvp); >>> >>>> The problem with this change, at least on FreeBSD, is that >>>> tdvp may not belong to ZFS. In that case VTOZ(tdvp) does not >>>> make any sense and would produce garbage or trigger an >>>> assert. Previously tdvp->v_vfsp != sdvp->v_vfsp check would >>>> catch that situations. Two side notes: - v_vfsp is actually >>>> v_mount on FreeBSD >>> >>> Ah that's good point. Any objection in putting a change to >>> their _freebsd_ counterpart (zfs_freebsd_rename and >>> zfs_freebsd_link) as attached? >> >> I think that at least the change to zfs_freebsd_rename as it is >> now would break locking and reference counting semantics for the >> vnodes involved -- vreles and vputs have to be done even in the >> error case. >> >> Also, look at the check at the start of ufs_rename, it also >> considers tvp when it's not NULL. I am not sure if it is >> possible to have a situation where fvp and tdvp are from the same >> fs, but tvp is from a different one. >> >> I've been also tempted to place the check in the common code in >> kern_renameat. That should cover the system calls. But could be >> insufficient for direct calls to VOP_RENAME in other places. >> >> diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c >> index a7cb87a..cfa4d93 100644 --- a/sys/kern/vfs_syscalls.c +++ >> b/sys/kern/vfs_syscalls.c @@ -3608,6 +3608,14 @@ >> kern_renameat(struct thread *td, int oldfd, char *old, int newfd, >> char *new, error = EINVAL; goto out; } + + /* Check for >> cross-device rename. */ + if ((fvp->v_mount != >> tdvp->v_mount) || + (tvp && (fvp->v_mount != >> tvp->v_mount))) { + error = EXDEV; + >> goto out; + } + /* * If the source is the same as the >> destination (that is, if they * are links to the same vnode), >> then there is nothing to do. >> >>>> - VOP_REALVP is a glorified nop on FreeBSD >>> >>> It's not clear to me what was the intention for having a macro >>> instead of just ifdef'ing the code out -- maybe to prevent >>> unwanted conflict with upstream? These two VOP's are the only >>> consumers of VOP_REALVP in tree. >> >> Yes. Personally I would just ifdef out the calls. >> >>>> Another unrelated problem that existed before this change and >>>> has been noted by Davide Italiano is that sdvp is not locked >>>> and so it can potentially be recycled before ZFS_ENTER() >>>> enter and for that reason sdzp and zfsvfs could be invalid. >>>> Because sdvp is referenced, this problem can currently occur >>>> only if a forced unmount runs concurrently to zfs_rename. >>>> tdvp should be locked and thus could be used instead of sdvp >>>> as a source for zfsvfs. >>> >>> I think this would need more work, I'll take a look after we >>> have this regression fixed. >> >> Thank you. >> >> -- Andriy Gapon > > I've written a patch that attempts to fix the second problem. > http://people.freebsd.org/~davide/review/zfsrename_recycle.diff The > culprit is that we're dereferencing sdvp without the vnode lock > held, so data-stability is not guaranteed. tdvp, instead, is locked > at the entry point of VOP_RENAME() (at least according to > vop_rename_pre()) so it can be safely used. As pointed out by > Andriy ZFS has some different mechanisms wrt other file systems > that allow the vnode to not be recycled when it's referenced > (ZFS_ENTER), so once we get zfsvfs_t * from tdzp we can call > ZFS_ENTER and dereference sdvp in a safe manner. I'm not sure that this is right. Now we have: tdzp = VTOZ(tdvp); ZFS_VERIFY_ZP(tdzp); zfsvfs = tdzp->z_zfsvfs; ZFS_ENTER(zfsvfs); // tdzp's z_zfsvfs entered zilog = zfsvfs->z_log; sdzp = VTOZ(sdvp); ZFS_VERIFY_ZP(sdzp); // (*) Note that in the (*) step, when sdzp is invalid and sdzp have different z_zfsvfs than tdzp (for instance when the node is in the snapshot directory; the code later would test this), we could end up with ZFS_EXIT()'ing the wrong z_zfsvfs. Cheers, - -- Xin LI <delphij@delphij.net> https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (FreeBSD) iQEcBAEBCgAGBQJSHUxBAAoJEG80Jeu8UPuzkzEIAKXlfGuTodrPcPkDkxBhhaOj QoxoT06jFwMTplysICuCpslQNyyG2Jxq654u9nMh6q5dww370eTtm2FJ0n2QTxk4 JeWLpZVUu7VHWNXYVJQqrmATaMFHO4wVf5AYpSHn+1iCWo0kQn1MPxPw/oSUt0yw 1628jhWTs8n+rxQaYrYN6ewXYeylMwC50ZB0kE/gQpQZ+cKAGmrM/8tur24SQBEo WwrHakrv1DGJ8rv3Q53FB2iUZ+zEAZs6MJ28w32lV3vOI20jfQbEK+RR7i7HvxdV c/7M96wCd0X4iEqZb87VVfJFSRdQsXy/35scXhhh/BSviT7rervS8XxPGhfpXOs= =MzwT -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?521D4C41.3060208>