From owner-freebsd-fs@freebsd.org Thu Jun 25 12:32:03 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8855B98CD7F for ; Thu, 25 Jun 2015 12:32:03 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EFC0C1610 for ; Thu, 25 Jun 2015 12:32:02 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wiwl6 with SMTP id l6so16589561wiw.0 for ; Thu, 25 Jun 2015 05:32:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=YJ4es8Qs8l+PimabFp9qms7h05LRfscde5bSIfyQev4=; b=oc/j+QA7L9eklWwB6sS36arYkjpSW71LC5ggP6KlFbhjC/ZXXKb2yVr4C6NdJ/IdHf gN2FC09jyjkmSoPK5EfzH4OXBoQa9LT3Umry6bOhtZd/Mu4+P+7a+momPiIGdUhDtBxl DyPIQ2Pnqp5QmY11PRmMNDjvkSCSg9SkhEuaEnMU/SRYONqisVmhrEVAAjtOSiFk2FG1 sezHeV2nj5UWW6f+ybCkek/z5X7z0nKlBWxPOMvqm6pPr6YWi0Te9mE62+H7Dft0Iv75 qMPMZGXMy3oJ3aEsUVkDQIPM+iQCKraf5AIjbAwioVRwWl1t1/4CuzIL+JomXvOkPwr4 r4eA== X-Received: by 10.180.106.195 with SMTP id gw3mr5346562wib.25.1435235521492; Thu, 25 Jun 2015 05:32:01 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id g15sm7405619wiv.22.2015.06.25.05.31.59 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 25 Jun 2015 05:31:59 -0700 (PDT) Date: Thu, 25 Jun 2015 14:31:57 +0200 From: Mateusz Guzik To: Konstantin Belousov Cc: freebsd-fs@freebsd.org Subject: Re: atomic v_usecount and v_holdcnt Message-ID: <20150625123156.GA29667@dft-labs.eu> References: <20141122002812.GA32289@dft-labs.eu> <20141122092527.GT17068@kib.kiev.ua> <20141122211147.GA23623@dft-labs.eu> <20141124095251.GH17068@kib.kiev.ua> <20150314225226.GA15302@dft-labs.eu> <20150316094643.GZ2379@kib.kiev.ua> <20150317014412.GA10819@dft-labs.eu> <20150318104442.GS2379@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150318104442.GS2379@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2015 12:32:03 -0000 On Wed, Mar 18, 2015 at 12:44:42PM +0200, Konstantin Belousov wrote: > On Tue, Mar 17, 2015 at 02:44:12AM +0100, Mateusz Guzik wrote: > > On Mon, Mar 16, 2015 at 11:46:43AM +0200, Konstantin Belousov wrote: > > > On Sat, Mar 14, 2015 at 11:52:26PM +0100, Mateusz Guzik wrote: > > > > On Mon, Nov 24, 2014 at 11:52:52AM +0200, Konstantin Belousov wrote: > > > > > On Sat, Nov 22, 2014 at 10:11:47PM +0100, Mateusz Guzik wrote: > > > > > > On Sat, Nov 22, 2014 at 11:25:27AM +0200, Konstantin Belousov wrote: > > > > > > > On Sat, Nov 22, 2014 at 01:28:12AM +0100, Mateusz Guzik wrote: > > > > > > > > The idea is that we don't need an interlock as long as we don't > > > > > > > > transition either counter 1->0 or 0->1. > > > > > > > I already said that something along the lines of the patch should work. > > > > > > > In fact, you need vnode lock when hold count changes between 0 and 1, > > > > > > > and probably the same for use count. > > > > > > > > > > > > > > > > > > > I don't see why this would be required (not that I'm an VFS expert). > > > > > > vnode recycling seems to be protected with the interlock. > > > > > > > > > > > > In fact I would argue that if this is really needed, current code is > > > > > > buggy. > > > > > Yes, it is already (somewhat) buggy. > > > > > > > > > > Most need of the lock is for the case of counts coming from 1 to 0. > > > > > The reason is the handling of the active vnode list, which is used > > > > > for limiting the amount of vnode list walking in syncer. When hold > > > > > count is decremented to 0, vnode is removed from the active list. > > > > > When use count is decremented to 0, vnode is supposedly inactivated, > > > > > and vinactive() cleans the cached pages belonging to vnode. In other > > > > > words, VI_OWEINACT for dirty vnode is sort of bug. > > > > > > > > > > > > > Modified the patch to no longer have the usecount + interlock dropped + > > > > VI_OWEINACT set window. > > > > > > > > Extended 0->1 hold count + vnode not locked window remains. I can fix > > > > that if it is really necessary by having _vhold return with interlock > > > > held if it did such transition. > > > > > > In v_upgrade_usecount(), you call v_incr_devcount() without without interlock > > > held. What prevents the devfs vnode from being recycled, in particular, > > > from invalidation of v_rdev pointer ? > > > > > > > Right, that was buggy. Fixed in the patch below. > Why non-atomicity of updates to several counters is safe ? This at least > requires an explanation in the comment, I mean holdcnt/usecnt pair. > The patch below was tested with make -j 40 buildworld in a loop for 7 hours and it survived. I started a comment above vget, unfinished yet. Further playing around revealed that zfs will vref a vnode with no usecount (zfs_lookup -> zfs_dirlook -> zfs_dirent_lock -> zfs_zget -> VN_HOLD) and it is possible that it will have VI_OWEINACT set (tested on a kernel without my patch). VN_HOLD is defined as vref(). The code can sleep, so some shuffling around can be done to call vinactive() if it happens to be exclusively locked (but most of the time it is locked shared). However, it seems that vputx deals with such consumers: if (vp->v_usecount > 0) vp->v_iflag &= ~VI_OWEINACT; Given that there are possibly more consumers like zfs how about: In vputx assert that the flag is unset if the usecount went to > 0. Clear the flag in vref and vget if transitioning 0->1 and assert it is unset otherwise. The way I read it is that in the stock kernel with properly timed vref the flag would be cleared anyway, with vinactive() only called if it was done by vget and only with the vnode exclusively locked. With a aforementioned change likelyhood of vinactive() remains the same, but now the flag state can be asserted. > Assume the thread increased the v_usecount, but did not managed to > acquire dev_mtx. Another thread performs vrele() and progressed to > v_decr_devcount(). It decreases the si_usecount, which might allow yet > another thread to see the si_usecount as too low and start unwanted > action. I think that the tests for VCHR must be done at the very > start of the functions, and devfs vnodes must hold vnode interlock > unconditionally. > Inserted v_type != VCHR checks in relevant places, vi_usecount manipulation functions now assert that the interlock is held. > > > > > I think that refcount_acquire_if_greater() KPI is excessive. You always > > > calls acquire with val == 0, and release with val == 1. > > > > > > > Yea i noted in my prevoius e-mail it should be changed (see below). > > > > I replaced them with refcount_acquire_if_not_zero and > > refcount_release_if_not_last. > I dislike the length of the names. Can you propose something shorter ? > Unfortunately the original API is alreday quite verbose and I don't have anything readable which would retain "refcount_acquire" (instead of a "ref_get" or "ref_acq"). Adding "_nz" as a suffix does not look good ("refcount_acquire_if_nz"). > The type for the local variable old in both functions should be u_int. > Done. > > > > > WRT to _refcount_release_lock, why is lock_object->lc_lock/lc_unlock KPI > > > cannot be used ? This allows to make refcount_release_lock() a function > > > instead of gcc extension macros. Not to mention that the macro is unused. > > > > These were supposed to be used by other code, forgot to remove it from > > the patch I sent here. > > > > We can discuss this in another thread. > > > > Striclty speaking we could use it here for vnode interlock, but I did > > not want to get around VI_LOCK macro (which right now is just a > > mtx_lock, but this may change). > > > > Updated patch is below: > Do not introduce ASSERT_VI_LOCK, the name difference between > ASSERT_VI_LOCKED and ASSERT_VI_LOCK is only in the broken grammar. > I do not see anything wrong with explicit if() statements where needed, > in all four places. Done. > > In vputx(), wrap the long line (if (refcount_release() || VI_DOINGINACT)). Done. diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c index 83f29c1..b587ebd 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c @@ -99,6 +99,6 @@ vn_rele_async(vnode_t *vp, taskq_t *taskq) (task_func_t *)vn_rele_inactive, vp, TQ_SLEEP) != 0); return; } - vp->v_usecount--; + refcount_release(&vp->v_usecount); vdropl(vp); } diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c index 19ef783..cb4ea94 100644 --- a/sys/kern/vfs_cache.c +++ b/sys/kern/vfs_cache.c @@ -661,12 +661,12 @@ success: ltype = VOP_ISLOCKED(dvp); VOP_UNLOCK(dvp, 0); } - VI_LOCK(*vpp); + vhold(*vpp); if (wlocked) CACHE_WUNLOCK(); else CACHE_RUNLOCK(); - error = vget(*vpp, cnp->cn_lkflags | LK_INTERLOCK, cnp->cn_thread); + error = vget(*vpp, cnp->cn_lkflags | LK_VNHELD, cnp->cn_thread); if (cnp->cn_flags & ISDOTDOT) { vn_lock(dvp, ltype | LK_RETRY); if (dvp->v_iflag & VI_DOOMED) { @@ -1366,9 +1366,9 @@ vn_dir_dd_ino(struct vnode *vp) if ((ncp->nc_flag & NCF_ISDOTDOT) != 0) continue; ddvp = ncp->nc_dvp; - VI_LOCK(ddvp); + vhold(ddvp); CACHE_RUNLOCK(); - if (vget(ddvp, LK_INTERLOCK | LK_SHARED | LK_NOWAIT, curthread)) + if (vget(ddvp, LK_SHARED | LK_NOWAIT | LK_VNHELD, curthread)) return (NULL); return (ddvp); } diff --git a/sys/kern/vfs_hash.c b/sys/kern/vfs_hash.c index 930fca1..48601e7 100644 --- a/sys/kern/vfs_hash.c +++ b/sys/kern/vfs_hash.c @@ -84,9 +84,9 @@ vfs_hash_get(const struct mount *mp, u_int hash, int flags, struct thread *td, s continue; if (fn != NULL && fn(vp, arg)) continue; - VI_LOCK(vp); + vhold(vp); rw_runlock(&vfs_hash_lock); - error = vget(vp, flags | LK_INTERLOCK, td); + error = vget(vp, flags | LK_VNHELD, td); if (error == ENOENT && (flags & LK_NOWAIT) == 0) break; if (error) @@ -128,9 +128,9 @@ vfs_hash_insert(struct vnode *vp, u_int hash, int flags, struct thread *td, stru continue; if (fn != NULL && fn(vp2, arg)) continue; - VI_LOCK(vp2); + vhold(vp2); rw_wunlock(&vfs_hash_lock); - error = vget(vp2, flags | LK_INTERLOCK, td); + error = vget(vp2, flags | LK_VNHELD, td); if (error == ENOENT && (flags & LK_NOWAIT) == 0) break; rw_wlock(&vfs_hash_lock); diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 1f1a7b6..a8cd2cb 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -68,6 +68,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -102,9 +103,8 @@ static int flushbuflist(struct bufv *bufv, int flags, struct bufobj *bo, static void syncer_shutdown(void *arg, int howto); static int vtryrecycle(struct vnode *vp); static void v_incr_usecount(struct vnode *); -static void v_decr_usecount(struct vnode *); -static void v_decr_useonly(struct vnode *); -static void v_upgrade_usecount(struct vnode *); +static void v_incr_devcount(struct vnode *); +static void v_decr_devcount(struct vnode *); static void vnlru_free(int); static void vgonel(struct vnode *); static void vfs_knllock(void *arg); @@ -868,7 +868,7 @@ vnlru_free(int count) */ freevnodes--; vp->v_iflag &= ~VI_FREE; - vp->v_holdcnt++; + refcount_acquire(&vp->v_holdcnt); mtx_unlock(&vnode_free_list_mtx); VI_UNLOCK(vp); @@ -2079,78 +2079,68 @@ reassignbuf(struct buf *bp) /* * Increment the use and hold counts on the vnode, taking care to reference - * the driver's usecount if this is a chardev. The vholdl() will remove - * the vnode from the free list if it is presently free. Requires the - * vnode interlock and returns with it held. + * the driver's usecount if this is a chardev. The _vhold() will remove + * the vnode from the free list if it is presently free. */ static void v_incr_usecount(struct vnode *vp) { + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vholdl(vp); - vp->v_usecount++; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount++; - dev_unlock(); - } -} -/* - * Turn a holdcnt into a use+holdcnt such that only one call to - * v_decr_usecount is needed. - */ -static void -v_upgrade_usecount(struct vnode *vp) -{ + if (vp->v_type == VCHR) { + VI_LOCK(vp); + _vhold(vp, true); + if (vp->v_iflag & VI_OWEINACT) { + VNASSERT(vp->v_usecount == 0, vp, + ("vnode with usecount and VI_OWEINACT set")); + vp->v_iflag &= ~VI_OWEINACT; + } + refcount_acquire(&vp->v_usecount); + v_incr_devcount(vp); + VI_UNLOCK(vp); + return; + } - CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount++; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount++; - dev_unlock(); + _vhold(vp, false); + if (refcount_acquire_if_not_zero(&vp->v_usecount)) { + VNASSERT((vp->v_iflag & VI_OWEINACT) == 0, vp, + ("vnode with usecount and VI_OWEINACT set")); + } else { + VI_LOCK(vp); + if (vp->v_iflag & VI_OWEINACT) + vp->v_iflag &= ~VI_OWEINACT; + refcount_acquire(&vp->v_usecount); + VI_UNLOCK(vp); } } /* - * Decrement the vnode use and hold count along with the driver's usecount - * if this is a chardev. The vdropl() below releases the vnode interlock - * as it may free the vnode. + * Increment si_usecount of the associated device, if any. */ static void -v_decr_usecount(struct vnode *vp) +v_incr_devcount(struct vnode *vp) { - ASSERT_VI_LOCKED(vp, __FUNCTION__); - VNASSERT(vp->v_usecount > 0, vp, - ("v_decr_usecount: negative usecount")); - CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount--; + ASSERT_VI_LOCKED(vp, __func__); + if (vp->v_type == VCHR && vp->v_rdev != NULL) { dev_lock(); - vp->v_rdev->si_usecount--; + vp->v_rdev->si_usecount++; dev_unlock(); } - vdropl(vp); } /* - * Decrement only the use count and driver use count. This is intended to - * be paired with a follow on vdropl() to release the remaining hold count. - * In this way we may vgone() a vnode with a 0 usecount without risk of - * having it end up on a free list because the hold count is kept above 0. + * Decrement si_usecount of the associated device, if any. */ static void -v_decr_useonly(struct vnode *vp) +v_decr_devcount(struct vnode *vp) { - ASSERT_VI_LOCKED(vp, __FUNCTION__); - VNASSERT(vp->v_usecount > 0, vp, - ("v_decr_useonly: negative usecount")); - CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount--; + ASSERT_VI_LOCKED(vp, __func__); + if (vp->v_type == VCHR && vp->v_rdev != NULL) { dev_lock(); vp->v_rdev->si_usecount--; @@ -2164,21 +2154,38 @@ v_decr_useonly(struct vnode *vp) * is being destroyed. Only callers who specify LK_RETRY will * see doomed vnodes. If inactive processing was delayed in * vput try to do it here. + * + * Notes on lockless counter manipulation: + * The hold count prevents the vnode from being freed, while the + * use count prevents it from being recycled. + * + * Only 1->0 and 0->1 transitions require atomicity with respect to + * other operations (e.g. taking the vnode off of a free list). + * In such a case the interlock is taken, which provides mutual + * exclusion against threads transitioning the other way. */ int vget(struct vnode *vp, int flags, struct thread *td) { - int error; + int error, oweinact; - error = 0; VNASSERT((flags & LK_TYPE_MASK) != 0, vp, ("vget: invalid lock operation")); + + if ((flags & LK_INTERLOCK) != 0) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); + if ((flags & LK_VNHELD) != 0) + VNASSERT((vp->v_holdcnt > 0), vp, + ("vget: LK_VNHELD passed but vnode not held")); + CTR3(KTR_VFS, "%s: vp %p with flags %d", __func__, vp, flags); - if ((flags & LK_INTERLOCK) == 0) - VI_LOCK(vp); - vholdl(vp); - if ((error = vn_lock(vp, flags | LK_INTERLOCK)) != 0) { + if ((flags & LK_VNHELD) == 0) + _vhold(vp, (flags & LK_INTERLOCK) != 0); + + if ((error = vn_lock(vp, flags)) != 0) { vdrop(vp); CTR2(KTR_VFS, "%s: impossible to lock vnode %p", __func__, vp); @@ -2186,22 +2193,34 @@ vget(struct vnode *vp, int flags, struct thread *td) } if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0) panic("vget: vn_lock failed to return ENOENT\n"); - VI_LOCK(vp); - /* Upgrade our holdcnt to a usecount. */ - v_upgrade_usecount(vp); + /* * We don't guarantee that any particular close will * trigger inactive processing so just make a best effort * here at preventing a reference to a removed file. If * we don't succeed no harm is done. + * + * Upgrade our holdcnt to a usecount. */ - if (vp->v_iflag & VI_OWEINACT) { - if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE && + if (vp->v_type != VCHR && + refcount_acquire_if_not_zero(&vp->v_usecount)) { + VNASSERT((vp->v_iflag & VI_OWEINACT) == 0, vp, + ("vnode with usecount and VI_OWEINACT set")); + } else { + VI_LOCK(vp); + if ((vp->v_iflag & VI_OWEINACT) == 0) { + oweinact = 0; + } else { + oweinact = 1; + vp->v_iflag &= ~VI_OWEINACT; + } + refcount_acquire(&vp->v_usecount); + v_incr_devcount(vp); + if (oweinact && VOP_ISLOCKED(vp) == LK_EXCLUSIVE && (flags & LK_NOWAIT) == 0) vinactive(vp, td); - vp->v_iflag &= ~VI_OWEINACT; + VI_UNLOCK(vp); } - VI_UNLOCK(vp); return (0); } @@ -2213,36 +2232,34 @@ vref(struct vnode *vp) { CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - VI_LOCK(vp); v_incr_usecount(vp); - VI_UNLOCK(vp); } /* * Return reference count of a vnode. * - * The results of this call are only guaranteed when some mechanism other - * than the VI lock is used to stop other processes from gaining references - * to the vnode. This may be the case if the caller holds the only reference. - * This is also useful when stale data is acceptable as race conditions may - * be accounted for by some other means. + * The results of this call are only guaranteed when some mechanism is used to + * stop other processes from gaining references to the vnode. This may be the + * case if the caller holds the only reference. This is also useful when stale + * data is acceptable as race conditions may be accounted for by some other + * means. */ int vrefcnt(struct vnode *vp) { - int usecnt; - VI_LOCK(vp); - usecnt = vp->v_usecount; - VI_UNLOCK(vp); - - return (usecnt); + return (vp->v_usecount); } #define VPUTX_VRELE 1 #define VPUTX_VPUT 2 #define VPUTX_VUNREF 3 +/* + * Decrement the use and hold counts for a vnode. + * + * See an explanation near vget() as to why atomic operation is safe. + */ static void vputx(struct vnode *vp, int func) { @@ -2255,33 +2272,44 @@ vputx(struct vnode *vp, int func) ASSERT_VOP_LOCKED(vp, "vput"); else KASSERT(func == VPUTX_VRELE, ("vputx: wrong func")); + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - VI_LOCK(vp); - - /* Skip this v_writecount check if we're going to panic below. */ - VNASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < 1, vp, - ("vputx: missed vn_close")); - error = 0; - if (vp->v_usecount > 1 || ((vp->v_iflag & VI_DOINGINACT) && - vp->v_usecount == 1)) { + if (vp->v_type != VCHR && + refcount_release_if_not_last(&vp->v_usecount)) { if (func == VPUTX_VPUT) VOP_UNLOCK(vp, 0); - v_decr_usecount(vp); + vdrop(vp); return; } - if (vp->v_usecount != 1) { - vprint("vputx: negative ref count", vp); - panic("vputx: negative ref cnt"); - } - CTR2(KTR_VFS, "%s: return vnode %p to the freelist", __func__, vp); + VI_LOCK(vp); + /* * We want to hold the vnode until the inactive finishes to * prevent vgone() races. We drop the use count here and the * hold count below when we're done. */ - v_decr_useonly(vp); + if (!refcount_release(&vp->v_usecount) || + (vp->v_iflag & VI_DOINGINACT)) { + if (func == VPUTX_VPUT) + VOP_UNLOCK(vp, 0); + v_decr_devcount(vp); + vdropl(vp); + return; + } + + v_decr_devcount(vp); + + error = 0; + + if (vp->v_usecount != 0) { + vprint("vputx: usecount not zero", vp); + panic("vputx: usecount not zero"); + } + + CTR2(KTR_VFS, "%s: return vnode %p to the freelist", __func__, vp); + /* * We must call VOP_INACTIVE with the node locked. Mark * as VI_DOINGINACT to avoid recursion. @@ -2307,7 +2335,8 @@ vputx(struct vnode *vp, int func) break; } if (vp->v_usecount > 0) - vp->v_iflag &= ~VI_OWEINACT; + VNASSERT((vp->v_iflag & VI_OWEINACT) == 0, vp, + ("vnode with usecount and VI_OWEINACT set")); if (error == 0) { if (vp->v_iflag & VI_OWEINACT) vinactive(vp, curthread); @@ -2351,36 +2380,36 @@ vunref(struct vnode *vp) } /* - * Somebody doesn't want the vnode recycled. - */ -void -vhold(struct vnode *vp) -{ - - VI_LOCK(vp); - vholdl(vp); - VI_UNLOCK(vp); -} - -/* * Increase the hold count and activate if this is the first reference. */ void -vholdl(struct vnode *vp) +_vhold(struct vnode *vp, bool locked) { struct mount *mp; + if (locked) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); -#ifdef INVARIANTS - /* getnewvnode() calls v_incr_usecount() without holding interlock. */ - if (vp->v_type != VNON || vp->v_data != NULL) - ASSERT_VI_LOCKED(vp, "vholdl"); -#endif - vp->v_holdcnt++; - if ((vp->v_iflag & VI_FREE) == 0) + if (!locked && refcount_acquire_if_not_zero(&vp->v_holdcnt)) { + VNASSERT((vp->v_iflag & VI_FREE) == 0, vp, + ("_vhold: vnode with holdcnt is free")); return; - VNASSERT(vp->v_holdcnt == 1, vp, ("vholdl: wrong hold count")); - VNASSERT(vp->v_op != NULL, vp, ("vholdl: vnode already reclaimed.")); + } + + if (!locked) + VI_LOCK(vp); + if ((vp->v_iflag & VI_FREE) == 0) { + refcount_acquire(&vp->v_holdcnt); + if (!locked) + VI_UNLOCK(vp); + return; + } + VNASSERT(vp->v_holdcnt == 0, vp, + ("%s: wrong hold count", __func__)); + VNASSERT(vp->v_op != NULL, vp, + ("%s: vnode already reclaimed.", __func__)); /* * Remove a vnode from the free list, mark it as in use, * and put it on the active list. @@ -2396,18 +2425,9 @@ vholdl(struct vnode *vp) TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist); mp->mnt_activevnodelistsize++; mtx_unlock(&vnode_free_list_mtx); -} - -/* - * Note that there is one less who cares about this vnode. - * vdrop() is the opposite of vhold(). - */ -void -vdrop(struct vnode *vp) -{ - - VI_LOCK(vp); - vdropl(vp); + refcount_acquire(&vp->v_holdcnt); + if (!locked) + VI_UNLOCK(vp); } /* @@ -2416,20 +2436,28 @@ vdrop(struct vnode *vp) * (marked VI_DOOMED) in which case we will free it. */ void -vdropl(struct vnode *vp) +_vdrop(struct vnode *vp, bool locked) { struct bufobj *bo; struct mount *mp; int active; - ASSERT_VI_LOCKED(vp, "vdropl"); + if (locked) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - if (vp->v_holdcnt <= 0) + if ((int)vp->v_holdcnt <= 0) panic("vdrop: holdcnt %d", vp->v_holdcnt); - vp->v_holdcnt--; - VNASSERT(vp->v_holdcnt >= vp->v_usecount, vp, - ("hold count less than use count")); - if (vp->v_holdcnt > 0) { + if (refcount_release_if_not_last(&vp->v_holdcnt)) { + if (locked) + VI_UNLOCK(vp); + return; + } + + if (!locked) + VI_LOCK(vp); + if (refcount_release(&vp->v_holdcnt) == 0) { VI_UNLOCK(vp); return; } diff --git a/sys/sys/lockmgr.h b/sys/sys/lockmgr.h index ff0473d..a74d5f5 100644 --- a/sys/sys/lockmgr.h +++ b/sys/sys/lockmgr.h @@ -159,6 +159,7 @@ _lockmgr_args_rw(struct lock *lk, u_int flags, struct rwlock *ilk, #define LK_SLEEPFAIL 0x000800 #define LK_TIMELOCK 0x001000 #define LK_NODDLKTREAT 0x002000 +#define LK_VNHELD 0x004000 /* * Operations for lockmgr(). diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h index 4611664..d3f817c 100644 --- a/sys/sys/refcount.h +++ b/sys/sys/refcount.h @@ -64,4 +64,32 @@ refcount_release(volatile u_int *count) return (old == 1); } +static __inline int +refcount_acquire_if_not_zero(volatile u_int *count) +{ + u_int old; + + for (;;) { + old = *count; + if (old == 0) + return (0); + if (atomic_cmpset_int(count, old, old + 1)) + return (1); + } +} + +static __inline int +refcount_release_if_not_last(volatile u_int *count) +{ + u_int old; + + for (;;) { + old = *count; + if (old == 1) + return (0); + if (atomic_cmpset_int(count, old, old - 1)) + return (1); + } +} + #endif /* ! __SYS_REFCOUNT_H__ */ diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index 36ef8af..9286a4e 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -162,8 +162,8 @@ struct vnode { daddr_t v_lastw; /* v last write */ int v_clen; /* v length of cur. cluster */ - int v_holdcnt; /* i prevents recycling. */ - int v_usecount; /* i ref count of users */ + u_int v_holdcnt; /* i prevents recycling. */ + u_int v_usecount; /* i ref count of users */ u_int v_iflag; /* i vnode flags (see below) */ u_int v_vflag; /* v vnode flags */ int v_writecount; /* v ref count of writers */ @@ -652,13 +652,15 @@ int vaccess_acl_posix1e(enum vtype type, uid_t file_uid, struct ucred *cred, int *privused); void vattr_null(struct vattr *vap); int vcount(struct vnode *vp); -void vdrop(struct vnode *); -void vdropl(struct vnode *); +#define vdrop(vp) _vdrop((vp), 0) +#define vdropl(vp) _vdrop((vp), 1) +void _vdrop(struct vnode *, bool); int vflush(struct mount *mp, int rootrefs, int flags, struct thread *td); int vget(struct vnode *vp, int lockflag, struct thread *td); void vgone(struct vnode *vp); -void vhold(struct vnode *); -void vholdl(struct vnode *); +#define vhold(vp) _vhold((vp), 0) +#define vholdl(vp) _vhold((vp), 1) +void _vhold(struct vnode *, bool); void vinactive(struct vnode *, struct thread *); int vinvalbuf(struct vnode *vp, int save, int slpflag, int slptimeo); int vtruncbuf(struct vnode *vp, struct ucred *cred, off_t length,