From owner-freebsd-current@freebsd.org Sun Nov 13 08:41:00 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7749BC3E9A9 for ; Sun, 13 Nov 2016 08:41:00 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com [IPv6:2a00:1450:400c:c09::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1E6961A1C; Sun, 13 Nov 2016 08:41:00 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm0-x232.google.com with SMTP id g23so48703912wme.1; Sun, 13 Nov 2016 00:41:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=UB0sDpMFr68S4+xJ53QNEtRsKIzIB7p4k2kMs6UqVAE=; b=P7vtObAJBsDQAg6YLPK2gI6tGLRCT+jY/BjQRvB+GItRPeplbG8tGu4vZmKrq6WRMP 3ql/AuNLjq0Fo/RpSVTD27Xx1/EkSIbWa2FOueXoATEDiuTMJbMILJhHVxMlI1rT2BXh Q0i5CO6PKV5xHuk0UDeYReM8joV40Ool2mfLEIOP/vtUociYTs+mYsifQO7gTq9JncDc j6xx1diEK2VrKMJqWUZGUT3AZE6mC0TW0lIst08EkxmeqqY0txfOr/YnBbOuuhfzUSDS Ue1FNFeFkFpjuRFLHfDLEZZ7BuVd93RjW6sz/Ytd0eO3a/0KxJOqUMYqjMEjMdWsA/w0 vl5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=UB0sDpMFr68S4+xJ53QNEtRsKIzIB7p4k2kMs6UqVAE=; b=VUBuT42xAevVHeGzOsXiphdxHAgcwxuBfRW1UDJN7xKZGM29vAG+m9Qtxqb+DtTlyq QMcql6LIJhkw4AmPCsJu/GDP5o26DMYJCnRyfwYuZEx0CFNclGWQPSCdXVfFtDouegcC VXfp789QA7DXAcN00wm0gh1YokFA9eJ8pk9gZc986p79eO86lYNu5ioJ9+Uszwl6jID+ ZO/+A67OZvxCJwgVpEdBwchbWxDeT/WkV+fDczVRv3MEHp4/MlsTsrNiBLJJvTaNX/Li NsQ83OUvochcpiLDoGWR5OxFZp0iVTuMN+k8fyJCoz+G/FnT99aubsCawkkgX0Ix2Mpu 1Qhg== X-Gm-Message-State: ABUngvd/AxbtiTV3y08n7SfOW7ACq8OAE4nDlIv8Fn5AULncGSfitm6Y+i3NXV87y9LaYw== X-Received: by 10.194.91.148 with SMTP id ce20mr18595784wjb.59.1479026457894; Sun, 13 Nov 2016 00:40:57 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id hy10sm21346236wjb.10.2016.11.13.00.40.57 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 13 Nov 2016 00:40:57 -0800 (PST) Date: Sun, 13 Nov 2016 09:40:54 +0100 From: Mateusz Guzik To: Don Lewis Cc: freebsd-current@FreeBSD.org, mjg@FreeBSD.org Subject: Re: panic: mutex ncnegl not owned at /usr/src/sys/kern/vfs_cache.c:743 in 12.0-CURRENT r308447 Message-ID: <20161113084054.GB14804@dft-labs.eu> Mail-Followup-To: Mateusz Guzik , Don Lewis , freebsd-current@FreeBSD.org, mjg@FreeBSD.org References: <20161112175347.GA14804@dft-labs.eu> <201611130112.uAD1Bv9V081063@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <201611130112.uAD1Bv9V081063@gw.catspoiler.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Nov 2016 08:41:00 -0000 On Sat, Nov 12, 2016 at 05:11:57PM -0800, Don Lewis wrote: > This !neg_locked code in cache_negative_remove() looks suspicious to me: > > if (!neg_locked) { > if (ncp->nc_flag & NCF_HOTNEGATIVE) { > hot_locked = true; > mtx_lock(&ncneg_hot.nl_lock); > if (!(ncp->nc_flag & NCF_HOTNEGATIVE)) { > list_locked = true; > mtx_lock(&neglist->nl_lock); > } > } else { > list_locked = true; > mtx_lock(&neglist->nl_lock); > } > > It looks like you handle NCF_HOTNEGATIVE going away while waiting for > the ncneg_hot.nl_lock, but don't handle the possible appearance of > NCF_HOTNEGATIVE while waiting for neglist->nl_lock. > Promotions from regular to hot are only possible on a hit, which is prevented by the caller holding both the vnode and bucket lock. The only way to see the entry at this stage is from the shrinker, which can demote it and then take all the locks to try to remove it. But it checks after locking if the node is still there. > What protects nc_flag, the lock for the list that it resides on? > It is supposed to be the hot list lock and I think this uncovers a bug here. Consider a NCF_HOTNEGATIVE entry which is being evicted. It sets the NCV_DVDROP flag without the lock held, but the entry is still not removed from negative lists. So in principle we can either lose the newly set flag or the information that hotnegative is unset. That said, I think the fix would be to remove from negative entries prior to setting the NCV_DVDROP flag. Normally the flag is protected by the hotlist lock. Untested, but should do the trick: --- vfs_cache.c.old 2016-11-13 09:37:50.096000000 +0100 +++ vfs_cache.c.new 2016-11-13 09:39:45.004000000 +0100 @@ -868,6 +868,13 @@ nc_get_name(ncp), ncp->nc_neghits); } LIST_REMOVE(ncp, nc_hash); + if (!(ncp->nc_flag & NCF_NEGATIVE)) { + TAILQ_REMOVE(&ncp->nc_vp->v_cache_dst, ncp, nc_dst); + if (ncp == ncp->nc_vp->v_cache_dd) + ncp->nc_vp->v_cache_dd = NULL; + } else { + cache_negative_remove(ncp, neg_locked); + } if (ncp->nc_flag & NCF_ISDOTDOT) { if (ncp == ncp->nc_dvp->v_cache_dd) ncp->nc_dvp->v_cache_dd = NULL; @@ -878,13 +885,6 @@ atomic_subtract_rel_long(&numcachehv, 1); } } - if (!(ncp->nc_flag & NCF_NEGATIVE)) { - TAILQ_REMOVE(&ncp->nc_vp->v_cache_dst, ncp, nc_dst); - if (ncp == ncp->nc_vp->v_cache_dd) - ncp->nc_vp->v_cache_dd = NULL; - } else { - cache_negative_remove(ncp, neg_locked); - } atomic_subtract_rel_long(&numcache, 1); } -- Mateusz Guzik