Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Nov 2016 09:40:54 +0100
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        freebsd-current@FreeBSD.org, mjg@FreeBSD.org
Subject:   Re: panic: mutex ncnegl not owned at /usr/src/sys/kern/vfs_cache.c:743 in 12.0-CURRENT r308447
Message-ID:  <20161113084054.GB14804@dft-labs.eu>
In-Reply-To: <201611130112.uAD1Bv9V081063@gw.catspoiler.org>
References:  <20161112175347.GA14804@dft-labs.eu> <201611130112.uAD1Bv9V081063@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 12, 2016 at 05:11:57PM -0800, Don Lewis wrote:
> This !neg_locked code in cache_negative_remove() looks suspicious to me:
> 
>         if (!neg_locked) {
>                 if (ncp->nc_flag & NCF_HOTNEGATIVE) {
>                         hot_locked = true;
>                         mtx_lock(&ncneg_hot.nl_lock);
>                         if (!(ncp->nc_flag & NCF_HOTNEGATIVE)) {
>                                 list_locked = true;
>                                 mtx_lock(&neglist->nl_lock);
>                         }
>                 } else {
>                         list_locked = true;
>                         mtx_lock(&neglist->nl_lock);
>                 }
> 
> It looks like you handle NCF_HOTNEGATIVE going away while waiting for
> the ncneg_hot.nl_lock, but don't handle the possible appearance of
> NCF_HOTNEGATIVE while waiting for neglist->nl_lock.
> 

Promotions from regular to hot are only possible on a hit, which is
prevented by the caller holding both the vnode and bucket lock. The only
way to see the entry at this stage is from the shrinker, which can
demote it and then take all the locks to try to remove it. But it checks
after locking if the node is still there.

> What protects nc_flag, the lock for the list that it resides on?
> 

It is supposed to be the hot list lock and I think this uncovers a bug
here.

Consider a NCF_HOTNEGATIVE entry which is being evicted. It sets the
NCV_DVDROP flag without the lock held, but the entry is still not
removed from negative lists. So in principle we can either lose the
newly set flag or the information that hotnegative is unset.

That said, I think the fix would be to remove from negative entries
prior to setting the NCV_DVDROP flag.

Normally the flag is protected by the hotlist lock.

Untested, but should do the trick:

--- vfs_cache.c.old	2016-11-13 09:37:50.096000000 +0100
+++ vfs_cache.c.new	2016-11-13 09:39:45.004000000 +0100
@@ -868,6 +868,13 @@
 		    nc_get_name(ncp), ncp->nc_neghits);
 	}
 	LIST_REMOVE(ncp, nc_hash);
+	if (!(ncp->nc_flag & NCF_NEGATIVE)) {
+		TAILQ_REMOVE(&ncp->nc_vp->v_cache_dst, ncp, nc_dst);
+		if (ncp == ncp->nc_vp->v_cache_dd)
+			ncp->nc_vp->v_cache_dd = NULL;
+	} else {
+		cache_negative_remove(ncp, neg_locked);
+	}
 	if (ncp->nc_flag & NCF_ISDOTDOT) {
 		if (ncp == ncp->nc_dvp->v_cache_dd)
 			ncp->nc_dvp->v_cache_dd = NULL;
@@ -878,13 +885,6 @@
 			atomic_subtract_rel_long(&numcachehv, 1);
 		}
 	}
-	if (!(ncp->nc_flag & NCF_NEGATIVE)) {
-		TAILQ_REMOVE(&ncp->nc_vp->v_cache_dst, ncp, nc_dst);
-		if (ncp == ncp->nc_vp->v_cache_dd)
-			ncp->nc_vp->v_cache_dd = NULL;
-	} else {
-		cache_negative_remove(ncp, neg_locked);
-	}
 	atomic_subtract_rel_long(&numcache, 1);
 }
 
-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161113084054.GB14804>