From owner-svn-src-all@FreeBSD.ORG Wed Aug 28 01:03:04 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A2F09B1F; Wed, 28 Aug 2013 01:03:04 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 83C2D2F6F; Wed, 28 Aug 2013 01:03:04 +0000 (UTC) Received: from zeta.ixsystems.com (unknown [69.198.165.132]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id EDCFA5E70; Tue, 27 Aug 2013 18:02:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1377651778; bh=zITuwb5J/96REDbRukR9rw5bEMZUWNa9eLdPtTH3hTo=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=m0ivMNOKaGb1WqoVTdmVwtEsDuCjxRMytszXpguFFQckUApn+3OVy7mm9hq1twA+v wIGtqcb7Zu2Xq6Eh/vs1pY0vOfJ70MKPt42B3GnsXljbDnxLyCcmCRMqfek3uVnWsf bBpYMvuAF8SvHKw+RzQhUMvLm2fUWYCxi0bGDEVY= Message-ID: <521D4C41.3060208@delphij.net> Date: Tue, 27 Aug 2013 18:02:57 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Davide Italiano Subject: Re: svn commit: r254585 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs References: <201308202231.r7KMVERi068300@svn.freebsd.org> <20130825221517.GM24767@caravan.chchile.org> <521B75CE.70103@FreeBSD.org> <521BDEAC.9080909@delphij.net> <521C5CAC.2060400@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: src-committers@freebsd.org, Xin LI , svn-src-all@freebsd.org, Andriy Gapon , Xin LI , svn-src-head@freebsd.org, Konstantin Belousov X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 01:03:04 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 08/27/13 01:59, Davide Italiano wrote: > On Tue, Aug 27, 2013 at 10:00 AM, Andriy Gapon > wrote: >> on 27/08/2013 02:03 Xin Li said the following: >>> On 08/26/13 08:35, Andriy Gapon wrote: >>>> on 26/08/2013 01:15 Jeremie Le Hen said the following: >>>>> Hi Xin, >>>>> >>>>> On Tue, Aug 20, 2013 at 10:31:14PM +0000, Xin LI wrote: >> [snip] >>>>>> @@ zfs_rename(vnode_t *sdvp, char *snm, vno if >>>>>> (VOP_REALVP(tdvp, &realvp, ct) == 0) tdvp = realvp; >>>>>> >>>>>> - if (tdvp->v_vfsp != sdvp->v_vfsp || >>>>>> zfsctl_is_node(tdvp)) { + tdzp = VTOZ(tdvp); >>> >>>> The problem with this change, at least on FreeBSD, is that >>>> tdvp may not belong to ZFS. In that case VTOZ(tdvp) does not >>>> make any sense and would produce garbage or trigger an >>>> assert. Previously tdvp->v_vfsp != sdvp->v_vfsp check would >>>> catch that situations. Two side notes: - v_vfsp is actually >>>> v_mount on FreeBSD >>> >>> Ah that's good point. Any objection in putting a change to >>> their _freebsd_ counterpart (zfs_freebsd_rename and >>> zfs_freebsd_link) as attached? >> >> I think that at least the change to zfs_freebsd_rename as it is >> now would break locking and reference counting semantics for the >> vnodes involved -- vreles and vputs have to be done even in the >> error case. >> >> Also, look at the check at the start of ufs_rename, it also >> considers tvp when it's not NULL. I am not sure if it is >> possible to have a situation where fvp and tdvp are from the same >> fs, but tvp is from a different one. >> >> I've been also tempted to place the check in the common code in >> kern_renameat. That should cover the system calls. But could be >> insufficient for direct calls to VOP_RENAME in other places. >> >> diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c >> index a7cb87a..cfa4d93 100644 --- a/sys/kern/vfs_syscalls.c +++ >> b/sys/kern/vfs_syscalls.c @@ -3608,6 +3608,14 @@ >> kern_renameat(struct thread *td, int oldfd, char *old, int newfd, >> char *new, error = EINVAL; goto out; } + + /* Check for >> cross-device rename. */ + if ((fvp->v_mount != >> tdvp->v_mount) || + (tvp && (fvp->v_mount != >> tvp->v_mount))) { + error = EXDEV; + >> goto out; + } + /* * If the source is the same as the >> destination (that is, if they * are links to the same vnode), >> then there is nothing to do. >> >>>> - VOP_REALVP is a glorified nop on FreeBSD >>> >>> It's not clear to me what was the intention for having a macro >>> instead of just ifdef'ing the code out -- maybe to prevent >>> unwanted conflict with upstream? These two VOP's are the only >>> consumers of VOP_REALVP in tree. >> >> Yes. Personally I would just ifdef out the calls. >> >>>> Another unrelated problem that existed before this change and >>>> has been noted by Davide Italiano is that sdvp is not locked >>>> and so it can potentially be recycled before ZFS_ENTER() >>>> enter and for that reason sdzp and zfsvfs could be invalid. >>>> Because sdvp is referenced, this problem can currently occur >>>> only if a forced unmount runs concurrently to zfs_rename. >>>> tdvp should be locked and thus could be used instead of sdvp >>>> as a source for zfsvfs. >>> >>> I think this would need more work, I'll take a look after we >>> have this regression fixed. >> >> Thank you. >> >> -- Andriy Gapon > > I've written a patch that attempts to fix the second problem. > http://people.freebsd.org/~davide/review/zfsrename_recycle.diff The > culprit is that we're dereferencing sdvp without the vnode lock > held, so data-stability is not guaranteed. tdvp, instead, is locked > at the entry point of VOP_RENAME() (at least according to > vop_rename_pre()) so it can be safely used. As pointed out by > Andriy ZFS has some different mechanisms wrt other file systems > that allow the vnode to not be recycled when it's referenced > (ZFS_ENTER), so once we get zfsvfs_t * from tdzp we can call > ZFS_ENTER and dereference sdvp in a safe manner. I'm not sure that this is right. Now we have: tdzp = VTOZ(tdvp); ZFS_VERIFY_ZP(tdzp); zfsvfs = tdzp->z_zfsvfs; ZFS_ENTER(zfsvfs); // tdzp's z_zfsvfs entered zilog = zfsvfs->z_log; sdzp = VTOZ(sdvp); ZFS_VERIFY_ZP(sdzp); // (*) Note that in the (*) step, when sdzp is invalid and sdzp have different z_zfsvfs than tdzp (for instance when the node is in the snapshot directory; the code later would test this), we could end up with ZFS_EXIT()'ing the wrong z_zfsvfs. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (FreeBSD) iQEcBAEBCgAGBQJSHUxBAAoJEG80Jeu8UPuzkzEIAKXlfGuTodrPcPkDkxBhhaOj QoxoT06jFwMTplysICuCpslQNyyG2Jxq654u9nMh6q5dww370eTtm2FJ0n2QTxk4 JeWLpZVUu7VHWNXYVJQqrmATaMFHO4wVf5AYpSHn+1iCWo0kQn1MPxPw/oSUt0yw 1628jhWTs8n+rxQaYrYN6ewXYeylMwC50ZB0kE/gQpQZ+cKAGmrM/8tur24SQBEo WwrHakrv1DGJ8rv3Q53FB2iUZ+zEAZs6MJ28w32lV3vOI20jfQbEK+RR7i7HvxdV c/7M96wCd0X4iEqZb87VVfJFSRdQsXy/35scXhhh/BSviT7rervS8XxPGhfpXOs= =MzwT -----END PGP SIGNATURE-----