From owner-freebsd-current@freebsd.org Sat Nov 12 17:53:52 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4124BC3D8B4 for ; Sat, 12 Nov 2016 17:53:52 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com [IPv6:2a00:1450:400c:c09::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C86CA1710; Sat, 12 Nov 2016 17:53:51 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm0-x241.google.com with SMTP id g23so4378666wme.1; Sat, 12 Nov 2016 09:53:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=wyPH693sS/gfZ/S/Tno35XaG/bbBpIRNj9sMpVcx+DI=; b=sPE9IR3zv99a7noc/70ykxEWCNKJOFLKKNSl71/f8xstTc+bE/dkcJeVSjXEaufXOQ yLhZOCEkPGoGsAPYq2eZcouxsKVxTXJW/uSbhH2pK7HgH1MIcK3QsgcK1GcVa2huSYbc Y2KxQGFtlF4ZAgWz3VeK+0VI/3xrXsKHCUEncpbGf90+rnHQALLCkgLh5ZTQCdENtxRx tIL/xUWKUosh7WwjHpt66cZt1Ma+CcpQrNHK8RQ2b4uDkoyEwyxmm5QJYwCsvBMiEufh KSgJZVtdEfcsWlyH6v/CGS5vOE5zbBQIX91vkXOtmKfQ2nXPxEMAnircEeErP4GCiXR8 NKXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=wyPH693sS/gfZ/S/Tno35XaG/bbBpIRNj9sMpVcx+DI=; b=BhYmxJVPlutRtMfBh4mNPrjZxVq3eo5oJs22DyOWGr6j+Yk7w2ctRJx0sESZNZ7AHA TtovE4GlVo+pvwvxdw0im47cSALXxvjeMd7cUR0a+TJEX9WsuDmrIQMvuGy0YED16pE+ 7jxb42MnuiiSYfguL7+yarAu/80E8ovmCsmB6H2P5bUoSX2qKdj77szwJtTK1pjzPkCE fH9A6b1vAbS6I/0DZVxtUUQoZmPPAEpRYNt2yJv6KZ5Z+cXFtrMPFz6CVmzSLGGfX/PS FxI0p4TgevGfEmMx5h+1sf7VnxROx4dMa/QG5DbZIyWRIdF4K6c25kOZ4QvW17KLZGgx JFXA== X-Gm-Message-State: ABUngvfV7FczhGF3Er7Tq/Ix/blDowqTt8unB4d/iKbIIdFCTM5hvKNP8G9RXo5+O1oTnQ== X-Received: by 10.28.158.148 with SMTP id h142mr3571845wme.59.1478973230067; Sat, 12 Nov 2016 09:53:50 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id n3sm18089353wjq.34.2016.11.12.09.53.49 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sat, 12 Nov 2016 09:53:49 -0800 (PST) Date: Sat, 12 Nov 2016 18:53:47 +0100 From: Mateusz Guzik To: Don Lewis Cc: freebsd-current@FreeBSD.org, mjg@FreeBSD.org Subject: Re: panic: mutex ncnegl not owned at /usr/src/sys/kern/vfs_cache.c:743 in 12.0-CURRENT r308447 Message-ID: <20161112175347.GA14804@dft-labs.eu> Mail-Followup-To: Mateusz Guzik , Don Lewis , freebsd-current@FreeBSD.org, mjg@FreeBSD.org References: <201611112356.uABNu3xY069782@gw.catspoiler.org> <201611120729.uAC7Tm9B077551@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <201611120729.uAC7Tm9B077551@gw.catspoiler.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Nov 2016 17:53:52 -0000 On Fri, Nov 11, 2016 at 11:29:48PM -0800, Don Lewis wrote: > On 11 Nov, To: freebsd-current@FreeBSD.org wrote: > > Yesterday I upgraded my ports builder box from r306349 to r308477. It > > crashed overnight during a poudriere run. > > > > panic: mutex ncnegl not owned at /usr/src/sys/kern/vfs_cache.c:743 > > This problem is very reproduceable for me, so I added some assertions to > the code to try to track it down: > > panic: Bad link elm 0xfffff802a59cb690 prev->next != elm > #13 0xffffffff80b0d55a in cache_negative_remove (ncp=0xfffff802a59cb690, > neg_locked=) at /usr/src/sys/kern/vfs_cache.c:747 > #14 0xffffffff80b0d84a in cache_zap_locked (ncp=0xfffff802a59cb690, > neg_locked=false) at /usr/src/sys/kern/vfs_cache.c:890 > #15 0xffffffff80b0ab98 in cache_zap_rlocked_bucket (ncp=0xfffff802a59cb690, > blp=) at /usr/src/sys/kern/vfs_cache.c:1021 > #16 cache_lookup (dvp=, vpp=, > cnp=, tsp=, ticksp=) > at /usr/src/sys/kern/vfs_cache.c:1308 > #17 0xffffffff80b0e97c in vfs_cache_lookup (ap=) > at /usr/src/sys/kern/vfs_cache.c:2024 > #18 0xffffffff8103fcea in VOP_LOOKUP_APV (vop=, > a=) at vnode_if.c:127 > #19 0xffffffff80b17ed2 in VOP_LOOKUP (vpp=0xfffffe085bfa86e8, > cnp=0xfffffe085bfa8710, dvp=) at ./vnode_if.h:54 > #20 lookup (ndp=) at /usr/src/sys/kern/vfs_lookup.c:830 > #21 0xffffffff80b1748e in namei (ndp=) > at /usr/src/sys/kern/vfs_lookup.c:397 > #22 0xffffffff80b30bc4 in kern_renameat (td=, oldfd=10, > old=0x28e72a59 , > newfd=10, > new=0x28e72601 , > pathseg=) at /usr/src/sys/kern/vfs_syscalls.c:3515 > #23 0xffffffff80ff98a0 in syscallenter (td=0xfffff801ebfd8500, > sa=) > at /usr/src/sys/amd64/ia32/../../kern/subr_syscall.c:135 > #24 ia32_syscall (frame=0xfffffe085bfa8a40) > at /usr/src/sys/amd64/ia32/ia32_syscall.c:187 > #25 > > > #13 0xffffffff80b0d55a in cache_negative_remove (ncp=0xfffff802a59cb690, > neg_locked=) at /usr/src/sys/kern/vfs_cache.c:747 > 747 TAILQ_REMOVE(&ncneg_hot.nl_list, ncp, nc_dst); > (kgdb) list > 742 } else { > 743 mtx_assert(&neglist->nl_lock, MA_OWNED); > 744 mtx_assert(&ncneg_hot.nl_lock, MA_OWNED); > 745 } > 746 if (ncp->nc_flag & NCF_HOTNEGATIVE) { > 747 TAILQ_REMOVE(&ncneg_hot.nl_list, ncp, nc_dst); > 748 } else { > 749 TAILQ_REMOVE(&neglist->nl_list, ncp, nc_dst); > 750 } > 751 if (list_locked) > > (kgdb) print ncp > $1 = (struct namecache *) 0xfffff802a59cb690 > > (kgdb) print *ncp > $2 = {nc_hash = {le_next = 0x0, le_prev = 0xfffffe0001814178}, nc_src = { > le_next = 0xfffff80415e22290, le_prev = 0xfffff8048e5597a8}, nc_dst = { > tqe_next = 0xfffff80421744a10, tqe_prev = 0xfffff80421744a30}, > nc_dvp = 0xfffff8048e559760, n_un = {nu_vp = 0x0, nu_neghits = 0}, > nc_flag = 32 ' ', nc_nlen = 25 '\031', > nc_name = 0xfffff802a59cb6d2 "gtk3-icon-browser.desktop"} > (kgdb) print ncp->nc_dst->tqe_prev > $8 = (struct namecache **) 0xfffff80421744a30 > (kgdb) print *(ncp->nc_dst->tqe_prev) > $9 = (struct namecache *) 0x0 > > > The line number for the failed assertion doesn't make sense. The value > of nc_flag is 32 (0x20), but NCF_HOTNEGATIVE is defined as 0x40 and > NCF_NEGATIVE is defined as 0x20. In any case, the list seems to be > messed up. > Can you check if the problem persists after reverting r307650 and r307685? I got another report of trouble with poudriere, but so far can neither reproduce nor diagnose what happens here. For instance with this crash, in principle there could be a race where the entry is demoted and loses the HOTNEGATIVE. But the demotion code has the hot list locked and so does this code if it spots the flag. What's more, the race is handled by re-checking the flag after the lock was taken. That said, there is something seriously fishing going on here. -- Mateusz Guzik