From owner-freebsd-current@FreeBSD.ORG Thu Mar 1 19:31:05 2012 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11EEC1065676; Thu, 1 Mar 2012 19:31:05 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id B6E188FC13; Thu, 1 Mar 2012 19:31:03 +0000 (UTC) Received: by lagv3 with SMTP id v3so1541612lag.13 for ; Thu, 01 Mar 2012 11:31:02 -0800 (PST) Received-SPF: pass (google.com: domain of gleb.kurtsou@gmail.com designates 10.152.136.36 as permitted sender) client-ip=10.152.136.36; Authentication-Results: mr.google.com; spf=pass (google.com: domain of gleb.kurtsou@gmail.com designates 10.152.136.36 as permitted sender) smtp.mail=gleb.kurtsou@gmail.com; dkim=pass header.i=gleb.kurtsou@gmail.com Received: from mr.google.com ([10.152.136.36]) by 10.152.136.36 with SMTP id px4mr5682827lab.47.1330630262409 (num_hops = 1); Thu, 01 Mar 2012 11:31:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=DZEIY65gvwU45OtYMCfmwZKC1kycUn7cMVES8NI5Hrs=; b=pDk0n+oQwnOTO8VofKZqhnig4D0tV99N+QPIgOPbK9g7Y5tdzqhkEkuAdnu6KbRH7P FKx3bqUhmoY/gf25RiLeMkfJ/GLSsOvrkMOKlFO2iLdD32T6WTFRdyP1luqlUduIk35Z mmgikMuMfxB3Ogd17X4sQITtrylNFT3oUD+fV7r3nOMNbMwG56LB755fNwP/ui4Nnh92 C6yenKux3TfPjad5Zq2QPisj+Rrfg4EZOZ2n/HD951N50gDukI1oCVy0BhUI4wYUuMfr mK1HPK736Cz1lGnhoEfa1ReH1hjGTkB5r7emUlfoh0va9+ZHcHCLR2+h0CHVhZLe1Goo Ofeg== Received: by 10.152.136.36 with SMTP id px4mr4653441lab.47.1330630262289; Thu, 01 Mar 2012 11:31:02 -0800 (PST) Received: from localhost ([78.157.92.5]) by mx.google.com with ESMTPS id tt8sm4209772lbb.16.2012.03.01.11.31.01 (version=SSLv3 cipher=OTHER); Thu, 01 Mar 2012 11:31:01 -0800 (PST) Date: Thu, 1 Mar 2012 21:31:03 +0200 From: Gleb Kurtsou To: Florian Smeets Message-ID: <20120301193102.GA35301@reks> References: <4F358F01.1090508@FreeBSD.org> <20120211102006.GA1274@reks> <4F469F94.5080606@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="C7zPtVaVf+AK4Oqc" Content-Disposition: inline In-Reply-To: <4F469F94.5080606@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: decke@FreeBSD.org, Konstantin Belousov , "current@freebsd.org" Subject: Re: Processes getting stuck in state tmpfs X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Mar 2012 19:31:05 -0000 --C7zPtVaVf+AK4Oqc Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Could you test the patch attached. It's also available here as seperate commits: https://github.com/glk/freebsd-head/commits/tmpfs-rename Thanks, Gleb. On (23/02/2012 21:20), Florian Smeets wrote: > On 11.02.12 11:20, Gleb Kurtsou wrote: > > On (10/02/2012 22:41), Florian Smeets wrote: > >> Hi, > >> > >> if you set WRKDIRPREFIX to a tmpfs mountpoint and try to build audio/gsm > >> from ports one of the mv processes gets stuck in state tmpfs quite > >> often. Traces from a kernel with WITTNESS and DEBUG_VFS_LOCKS are > >> available here http://tb.smeets.im/~flo/tmpfs.txt > > > > It's because of incorrect vnode locking order in tmpfs_rename. Issue is > > known and tmpfs is not the only file system suffering from it (e.g. ext2). > > > > There two ways of working around it in tree: > > * UFS: try locking vnode, unlock all vnodes on failure, restart, > > relookup vnodes needed. > > * ZFS: introduce directory entry locks to guarantee fvp won't disappear, > > fdvp can be safely traversed, etc. That won't be easy.. > > > > UFS-way would be a good temporal solution, but I think we should work on > > improving VOP_RENAME() in a long run. > > > > I'll try to prepare a patch in several days. > > > > Hey Gleb, > > did you get anywhere with this? > > Thanks, > Florian > --C7zPtVaVf+AK4Oqc Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="tmpfs-rename-deadlock.patch.txt" diff --git a/sys/fs/tmpfs/tmpfs_subr.c b/sys/fs/tmpfs/tmpfs_subr.c index d2b2245..fe596aa 100644 --- a/sys/fs/tmpfs/tmpfs_subr.c +++ b/sys/fs/tmpfs/tmpfs_subr.c @@ -42,6 +42,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -56,6 +57,8 @@ __FBSDID("$FreeBSD$"); #include #include +SYSCTL_NODE(_vfs, OID_AUTO, tmpfs, CTLFLAG_RW, 0, "tmpfs file system"); + /* --------------------------------------------------------------------- */ /* @@ -320,9 +323,11 @@ loop: MPASS((node->tn_vpstate & TMPFS_VNODE_DOOMED) == 0); VI_LOCK(vp); TMPFS_NODE_UNLOCK(node); - vholdl(vp); - (void) vget(vp, lkflag | LK_INTERLOCK | LK_RETRY, curthread); - vdrop(vp); + error = vget(vp, lkflag | LK_INTERLOCK, curthread); + if (error != 0) { + vp = NULL; + goto out; + } /* * Make sure the vnode is still there after @@ -420,11 +425,13 @@ unlock: out: *vpp = vp; - MPASS(IFF(error == 0, *vpp != NULL && VOP_ISLOCKED(*vpp))); #ifdef INVARIANTS - TMPFS_NODE_LOCK(node); - MPASS(*vpp == node->tn_vnode); - TMPFS_NODE_UNLOCK(node); + if (error == 0) { + MPASS(*vpp != NULL && VOP_ISLOCKED(*vpp)); + TMPFS_NODE_LOCK(node); + MPASS(*vpp == node->tn_vnode); + TMPFS_NODE_UNLOCK(node); + } #endif return error; diff --git a/sys/fs/tmpfs/tmpfs_vnops.c b/sys/fs/tmpfs/tmpfs_vnops.c index 93fea8b..dd54679 100644 --- a/sys/fs/tmpfs/tmpfs_vnops.c +++ b/sys/fs/tmpfs/tmpfs_vnops.c @@ -46,6 +46,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -58,6 +59,13 @@ __FBSDID("$FreeBSD$"); #include #include +SYSCTL_DECL(_vfs_tmpfs); + +static volatile int tmpfs_rename_restarts; +SYSCTL_INT(_vfs_tmpfs, OID_AUTO, rename_restarts, CTLFLAG_RD, + __DEVOLATILE(int *, &tmpfs_rename_restarts), 0, + "Times rename had to restart due to lock contention"); + /* --------------------------------------------------------------------- */ static int @@ -920,6 +928,118 @@ out: /* --------------------------------------------------------------------- */ static int +tmpfs_rename_relock(struct vnode *fdvp, struct vnode **fvpp, + struct vnode *tdvp, struct vnode **tvpp, + struct componentname *fcnp, struct componentname *tcnp) +{ + struct vnode *nvp; + struct mount *mp; + struct tmpfs_dirent *de; + int error; + + VOP_UNLOCK(tdvp, 0); + if (*tvpp != NULL && *tvpp != tdvp) + VOP_UNLOCK(*tvpp, 0); + mp = fdvp->v_mount; + +relock: + atomic_add_int(&tmpfs_rename_restarts, 1); + error = vn_lock(fdvp, LK_EXCLUSIVE); + if (error) + goto releout; + if (vn_lock(tdvp, LK_EXCLUSIVE | LK_NOWAIT) != 0) { + VOP_UNLOCK(fdvp, 0); + error = vn_lock(tdvp, LK_EXCLUSIVE); + if (error) + goto releout; + VOP_UNLOCK(tdvp, 0); + goto relock; + } + /* + * Re-resolve fvp to be certain it still exists and fetch the + * correct vnode. + */ + de = tmpfs_dir_lookup(VP_TO_TMPFS_DIR(fdvp), NULL, fcnp); + if (de == NULL) { + VOP_UNLOCK(fdvp, 0); + VOP_UNLOCK(tdvp, 0); + if ((fcnp->cn_flags & ISDOTDOT) != 0 || + (fcnp->cn_namelen == 1 && fcnp->cn_nameptr[0] == '.')) + error = EINVAL; + else + error = ENOENT; + goto releout; + } + error = tmpfs_alloc_vp(mp, de->td_node, LK_EXCLUSIVE | LK_NOWAIT, &nvp); + if (error != 0) { + VOP_UNLOCK(fdvp, 0); + VOP_UNLOCK(tdvp, 0); + if (error != EBUSY) + goto releout; + error = tmpfs_alloc_vp(mp, de->td_node, LK_EXCLUSIVE, &nvp); + if (error != 0) + goto releout; + VOP_UNLOCK(nvp, 0); + vrele(*fvpp); + *fvpp = nvp; + goto relock; + } + vrele(*fvpp); + *fvpp = nvp; + VOP_UNLOCK(*fvpp, 0); + /* + * Re-resolve tvp and acquire the vnode lock if present. + */ + de = tmpfs_dir_lookup(VP_TO_TMPFS_DIR(tdvp), NULL, tcnp); + /* + * If tvp disappeared we just carry on. + */ + if (de == NULL && *tvpp != NULL) { + vrele(*tvpp); + *tvpp = NULL; + } + /* + * Get the tvp ino if the lookup succeeded. We may have to restart + * if the non-blocking acquire fails. + */ + if (de != NULL) { + nvp = NULL; + error = tmpfs_alloc_vp(mp, de->td_node, + LK_EXCLUSIVE | LK_NOWAIT, &nvp); + if (*tvpp != NULL) + vrele(*tvpp); + *tvpp = nvp; + if (error != 0) { + VOP_UNLOCK(fdvp, 0); + VOP_UNLOCK(tdvp, 0); + if (error != EBUSY) + goto releout; + error = tmpfs_alloc_vp(mp, de->td_node, LK_EXCLUSIVE, + &nvp); + if (error != 0) + goto releout; + VOP_UNLOCK(nvp, 0); + if (*tvpp == fdvp) { + error = ENOTEMPTY; + goto releout; + } + goto relock; + } + } + + return (0); + +releout: + vrele(fdvp); + vrele(*fvpp); + vrele(tdvp); + if (*tvpp != NULL) + vrele(*tvpp); + + return (error); +} + +static int tmpfs_rename(struct vop_rename_args *v) { struct vnode *fdvp = v->a_fdvp; @@ -928,6 +1048,7 @@ tmpfs_rename(struct vop_rename_args *v) struct vnode *tdvp = v->a_tdvp; struct vnode *tvp = v->a_tvp; struct componentname *tcnp = v->a_tcnp; + struct mount *mp = NULL; char *newname; int error; @@ -943,8 +1064,6 @@ tmpfs_rename(struct vop_rename_args *v) MPASS(fcnp->cn_flags & HASBUF); MPASS(tcnp->cn_flags & HASBUF); - tnode = (tvp == NULL) ? NULL : VP_TO_TMPFS_NODE(tvp); - /* Disallow cross-device renames. * XXX Why isn't this done by the caller? */ if (fvp->v_mount != tdvp->v_mount || @@ -953,9 +1072,6 @@ tmpfs_rename(struct vop_rename_args *v) goto out; } - tmp = VFS_TO_TMPFS(tdvp->v_mount); - tdnode = VP_TO_TMPFS_DIR(tdvp); - /* If source and target are the same file, there is nothing to do. */ if (fvp == tvp) { error = 0; @@ -964,8 +1080,30 @@ tmpfs_rename(struct vop_rename_args *v) /* If we need to move the directory between entries, lock the * source so that we can safely operate on it. */ - if (fdvp != tdvp && fdvp != tvp) - vn_lock(fdvp, LK_EXCLUSIVE | LK_RETRY); + if (fdvp != tdvp && fdvp != tvp) { + if (vn_lock(fdvp, LK_EXCLUSIVE | LK_NOWAIT) != 0) { + mp = tdvp->v_mount; + error = vfs_busy(mp, 0); + if (error != 0) { + mp = NULL; + goto out; + } + error = tmpfs_rename_relock(fdvp, &fvp, tdvp, &tvp, + fcnp, tcnp); + if (error != 0) { + vfs_unbusy(mp); + return (error); + } + if (fvp == tvp) { + error = 0; + goto out_locked; + } + } + } + + tmp = VFS_TO_TMPFS(tdvp->v_mount); + tdnode = VP_TO_TMPFS_DIR(tdvp); + tnode = (tvp == NULL) ? NULL : VP_TO_TMPFS_NODE(tvp); fdnode = VP_TO_TMPFS_DIR(fdvp); fnode = VP_TO_TMPFS_NODE(fvp); de = tmpfs_dir_lookup(fdnode, fnode, fcnp); @@ -1157,6 +1295,9 @@ out: vrele(fdvp); vrele(fvp); + if (mp != NULL) + vfs_unbusy(mp); + return error; } --C7zPtVaVf+AK4Oqc--