Date: Tue, 10 Sep 2002 09:45:55 -0400 (EDT) From: Robert Watson <rwatson@FreeBSD.ORG> To: Don Lewis <dl-freebsd@catspoiler.org> Cc: bde@zeta.org.au, current@FreeBSD.ORG, jeff@FreeBSD.ORG Subject: Re: vnode lock assertion problem in nfs_link() Message-ID: <Pine.NEB.3.96L.1020910093934.62812b-100000@fledge.watson.org> In-Reply-To: <200209101154.g8ABsdwr094898@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 10 Sep 2002, Don Lewis wrote: > >>> > The changes are obviously just cleanups for leaf file systems, but I > >>> > wonder why everything wasn't always locked at the top. Could it have > >>> > been because locking all the way down is harmful? > > >> I think this is because some places are still worrying about using > >> link(2) on directories. I think there aren't any significant problems > >> when vp is not a directory. Hard linking of directories was permitted > >> (and perhaps even supported) until relatively recently: > > > > I suspect that you are right about this. If so, it should be able to > > let namei() lock the first vnode since we check its type and bail out if > > it is a directory before the second namei() call. > > I looked at namei() and ufs_lookup() and found a problem with telling > the first namei() to grab the lock. It appears that the *_lookup() > routines always lock the child vnode, and namei() later releases the > lock if LOCKLEAF is not set. If we passed LOCKLEAF to the first > namei(), then link("/foo/bar", "/foo/bar") would attempt to lock "bar" > twice. This also violates the "directory vnodes before file vnodes" lock order. Generally speaking, you must always lock directories before files, since files are always leaves, and directories may not be. The patch you have looks fine to me, since it grabs a reference to the file but doesn't lock it until after the directory look is held (and guarantees it's a leaf node by checking VDIR so that it's safe to do the locking later). > Because we now bail out early if the first vnode turns out to be a > pointer, I believe we can dispense with the vnode inequality test and > always grab the lock. Yeah. vp != VDIR, and dvp == VDIR due to earlier constraints, so we know that vp != dvp. > BTW, is it safe to call ASSERT_VOP_UNLOCKED() in the SMP case after the > reference has been dropped with vput() or vrele()? Well, all the VFS code currently runs under Giant so a certain class of unsafe cases is avoided, but generally no: ASSERT_VOP_UNLOCKED() invokes a vnode operation to determine if the vnode is locked or not, which dereferences fields in the vnode that require a valid reference to use. Because none of the code there currently blocks, it's not an issue generally right now (as neither vnode is likely to have been deleted during the link operation, so releasing the reference is unlikely to result in chances to the vnode). It's worth fixing though. > Here's a replacement for the kern_link() patch in my previous patch > file: Looks good to me. I'm not familiar enough with unionfs and stacking to provide an effective review of that code, however. Maybe Jeff can give us some comments? (unionfs is actually the reason I stalled in making the same changes). > Index: vfs_syscalls.c > =================================================================== > RCS file: /home/ncvs/src/sys/kern/vfs_syscalls.c,v > retrieving revision 1.285 > diff -u -r1.285 vfs_syscalls.c > --- vfs_syscalls.c 1 Sep 2002 20:37:28 -0000 1.285 > +++ vfs_syscalls.c 10 Sep 2002 11:28:01 -0000 > @@ -1027,10 +1027,12 @@ > if (nd.ni_vp != NULL) { > vrele(nd.ni_vp); > error = EEXIST; > - } else { > + } else if ((error = vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td)) > + == 0) { > VOP_LEASE(nd.ni_dvp, td, td->td_ucred, LEASE_WRITE); > VOP_LEASE(vp, td, td->td_ucred, LEASE_WRITE); > error = VOP_LINK(nd.ni_dvp, vp, &nd.ni_cnd); > + VOP_UNLOCK(vp, 0, td); > } > NDFREE(&nd, NDF_ONLY_PNBUF); > vput(nd.ni_dvp); > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1020910093934.62812b-100000>