From owner-freebsd-current@FreeBSD.ORG Thu Feb 3 13:57:07 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87E4E10656A5; Thu, 3 Feb 2011 13:57:06 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A86DD8FC17; Thu, 3 Feb 2011 13:57:06 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 5B09F46B51; Thu, 3 Feb 2011 08:57:06 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.10]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 740E78A027; Thu, 3 Feb 2011 08:57:05 -0500 (EST) From: John Baldwin To: mdf@freebsd.org Date: Thu, 3 Feb 2011 08:05:31 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.4-CBSD-20110107; KDE/4.4.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201102030805.31743.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Thu, 03 Feb 2011 08:57:05 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=0.5 required=4.2 tests=BAYES_00,MAY_BE_FORGED, RDNS_DYNAMIC autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: freebsd-current@freebsd.org Subject: Re: Panic with mca trap X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Feb 2011 13:57:07 -0000 On Tuesday, February 01, 2011 11:58:12 am mdf@freebsd.org wrote: > On a piece of hardware trying to verify basic build tests, we got an > MCA exception that then panic'd the kernel due to WITNESS/INVARIANTS > interaction. > > panic @ time 1296563157.510, thread 0xffffff0005540000: blockable > sleep lock (sleep mutex) 128 @ /build/mnt/src/sys/vm/uma_core.c:1872 > > Stack: -------------------------------------------------- > kernel:witness_checkorder+0x7a2 > kernel:_mtx_lock_flags+0x81 > kernel:uma_zalloc_arg+0x256 > kernel:malloc+0xc5 > kernel:mca_record_entry+0x30 > kernel:mca_scan+0xc9 > kernel:mca_intr+0x79 > kernel:trap+0x30b > kernel:witness_checkorder+0x66 > kernel:_mtx_lock_spin_flags+0xa4 > kernel:witness_checkorder+0x2a8 > kernel:_mtx_lock_spin_flags+0xa4 > kernel:tdq_idled+0xe8 > kernel:sched_idletd+0x5b > kernel:fork_exit+0x9b > > That's this bit of code in uma_zalloc_arg(): > > #ifdef INVARIANTS > ZONE_LOCK(zone); > uma_dbg_alloc(zone, NULL, item); > ZONE_UNLOCK(zone); > #endif > > > I don't know uma(9) well enough to know the best workaround. Clearly > there are times we can be in uma_zalloc_arg() and taking a regular > mutex is not acceptable. But what to do for the uma_dbg_free() call > so it's happy, and whether to guard taking the ZONE lock with M_NOWAIT > or td_critnest > 0 or both is outside my current knowledge. > > I don't expect we'll see this panic again any time soon, but it would > be nice to fix the story for WITNESS of when an M_NOWAIT allocation > can be done. Actually, this is more my fault. The machine check happened while the interrupted thread was already in a critical section (hence the WITNESS complaint). However, it really isn't correct to be calling malloc() from an arbitrary exception handler, especially one like MC# which can fire pretty much anywhere. I think instead that we should use malloc() when polling the machine check banks, but keep a pre-allocated pool of structures for use with MC# exceptions and CMC interrupts and replenish the pool via an asynchronous task. -- John Baldwin