From owner-svn-src-all@FreeBSD.ORG Tue Aug 27 08:59:29 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 849C3DFE; Tue, 27 Aug 2013 08:59:29 +0000 (UTC) (envelope-from davide.italiano@gmail.com) Received: from mail-vc0-x22f.google.com (mail-vc0-x22f.google.com [IPv6:2607:f8b0:400c:c03::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DA5B927FD; Tue, 27 Aug 2013 08:59:28 +0000 (UTC) Received: by mail-vc0-f175.google.com with SMTP id ia10so2843348vcb.34 for ; Tue, 27 Aug 2013 01:59:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=UJ5GpDB4TjrL5CRA+/lSU4NOStVfGZpGkd7XPLKH4g0=; b=WK7+riKQeOH2+EuVyGi8U66fsP4fn4TNbKRXpCKDp1gl095xMYUHDRbrF+1O3CoqEn N8FSZpTu9ZvUYyel6Dwk5Zwz+NevJmEpNK5bvKyYDulzW0k8eBSt5OPq1BsV11OkipKZ 66sz4/uEsLwIJd6WDzwNlvdLLgGSN2kYe4NTfiVd2RCI69ZLaKa1H5zP2Mks70k/eVcs 1Nj6Yi18PLYEZ5z/vUjaQv28odtDXf4CrxBEjaSKcZqilnSQhVwEJfdKmm0BC4OtvZWK 5m79dpym1jwxU2fzwSt3/HfDePFAGZQLYv/joKB2za16sTHmpegQOCDzkDJC1VOh1qgh wLYw== MIME-Version: 1.0 X-Received: by 10.58.165.70 with SMTP id yw6mr10193063veb.19.1377593967960; Tue, 27 Aug 2013 01:59:27 -0700 (PDT) Sender: davide.italiano@gmail.com Received: by 10.220.65.132 with HTTP; Tue, 27 Aug 2013 01:59:27 -0700 (PDT) In-Reply-To: <521C5CAC.2060400@FreeBSD.org> References: <201308202231.r7KMVERi068300@svn.freebsd.org> <20130825221517.GM24767@caravan.chchile.org> <521B75CE.70103@FreeBSD.org> <521BDEAC.9080909@delphij.net> <521C5CAC.2060400@FreeBSD.org> Date: Tue, 27 Aug 2013 10:59:27 +0200 X-Google-Sender-Auth: AFj1pSbxIhU9qYStQ51oeNkFwz4 Message-ID: Subject: Re: svn commit: r254585 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs From: Davide Italiano To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Cc: src-committers@freebsd.org, Xin LI , svn-src-all@freebsd.org, Xin LI , svn-src-head@freebsd.org, Konstantin Belousov , Xin Li X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 08:59:29 -0000 On Tue, Aug 27, 2013 at 10:00 AM, Andriy Gapon wrote: > on 27/08/2013 02:03 Xin Li said the following: >> On 08/26/13 08:35, Andriy Gapon wrote: >>> on 26/08/2013 01:15 Jeremie Le Hen said the following: >>>> Hi Xin, >>>> >>>> On Tue, Aug 20, 2013 at 10:31:14PM +0000, Xin LI wrote: > [snip] >>>>> @@ zfs_rename(vnode_t *sdvp, char *snm, vno if >>>>> (VOP_REALVP(tdvp, &realvp, ct) == 0) tdvp = realvp; >>>>> >>>>> - if (tdvp->v_vfsp != sdvp->v_vfsp || zfsctl_is_node(tdvp)) { + >>>>> tdzp = VTOZ(tdvp); >> >>> The problem with this change, at least on FreeBSD, is that tdvp may >>> not belong to ZFS. In that case VTOZ(tdvp) does not make any sense >>> and would produce garbage or trigger an assert. Previously >>> tdvp->v_vfsp != sdvp->v_vfsp check would catch that situations. Two >>> side notes: - v_vfsp is actually v_mount on FreeBSD >> >> Ah that's good point. Any objection in putting a change to their >> _freebsd_ counterpart (zfs_freebsd_rename and zfs_freebsd_link) as >> attached? > > I think that at least the change to zfs_freebsd_rename as it is now would break > locking and reference counting semantics for the vnodes involved -- vreles and > vputs have to be done even in the error case. > > Also, look at the check at the start of ufs_rename, it also considers tvp when > it's not NULL. I am not sure if it is possible to have a situation where fvp > and tdvp are from the same fs, but tvp is from a different one. > > I've been also tempted to place the check in the common code in kern_renameat. > That should cover the system calls. But could be insufficient for direct calls > to VOP_RENAME in other places. > > diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c > index a7cb87a..cfa4d93 100644 > --- a/sys/kern/vfs_syscalls.c > +++ b/sys/kern/vfs_syscalls.c > @@ -3608,6 +3608,14 @@ kern_renameat(struct thread *td, int oldfd, char *old, > int newfd, char *new, > error = EINVAL; > goto out; > } > + > + /* Check for cross-device rename. */ > + if ((fvp->v_mount != tdvp->v_mount) || > + (tvp && (fvp->v_mount != tvp->v_mount))) { > + error = EXDEV; > + goto out; > + } > + > /* > * If the source is the same as the destination (that is, if they > * are links to the same vnode), then there is nothing to do. > >>> - VOP_REALVP is a glorified nop on FreeBSD >> >> It's not clear to me what was the intention for having a macro instead >> of just ifdef'ing the code out -- maybe to prevent unwanted conflict >> with upstream? These two VOP's are the only consumers of VOP_REALVP >> in tree. > > Yes. Personally I would just ifdef out the calls. > >>> Another unrelated problem that existed before this change and has >>> been noted by Davide Italiano is that sdvp is not locked and so it >>> can potentially be recycled before ZFS_ENTER() enter and for that >>> reason sdzp and zfsvfs could be invalid. Because sdvp is >>> referenced, this problem can currently occur only if a forced >>> unmount runs concurrently to zfs_rename. tdvp should be locked and >>> thus could be used instead of sdvp as a source for zfsvfs. >> >> I think this would need more work, I'll take a look after we have this >> regression fixed. > > Thank you. > > -- > Andriy Gapon I've written a patch that attempts to fix the second problem. http://people.freebsd.org/~davide/review/zfsrename_recycle.diff The culprit is that we're dereferencing sdvp without the vnode lock held, so data-stability is not guaranteed. tdvp, instead, is locked at the entry point of VOP_RENAME() (at least according to vop_rename_pre()) so it can be safely used. As pointed out by Andriy ZFS has some different mechanisms wrt other file systems that allow the vnode to not be recycled when it's referenced (ZFS_ENTER), so once we get zfsvfs_t * from tdzp we can call ZFS_ENTER and dereference sdvp in a safe manner. Thanks, -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare