From owner-cvs-src@FreeBSD.ORG Thu Dec 16 03:51:55 2004 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 79C4616A4CE; Thu, 16 Dec 2004 03:51:55 +0000 (GMT) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id D402A43D41; Thu, 16 Dec 2004 03:51:54 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])iBG3pqGx007857; Thu, 16 Dec 2004 14:51:52 +1100 Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) iBG3pn7M010977; Thu, 16 Dec 2004 14:51:50 +1100 Date: Thu, 16 Dec 2004 14:51:49 +1100 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Kris Kennaway In-Reply-To: <20041215151526.GA3462@xor.obsecurity.org> Message-ID: <20041216144239.T1723@epsplex.bde.org> References: <200411300618.iAU6IkQX065609@repoman.freebsd.org> <20041215001034.GA60875@xor.obsecurity.org> <20041215151526.GA3462@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: cvs-src@FreeBSD.org cc: src-committers@FreeBSD.org cc: cvs-all@FreeBSD.org cc: John Baldwin cc: Nate Lawson Subject: Re: cvs commit: src/sys/i386/i386 vm_machdep.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Dec 2004 03:51:55 -0000 On Wed, 15 Dec 2004, Kris Kennaway wrote: > On Tue, Dec 14, 2004 at 09:48:48PM -0500, John Baldwin wrote: > > On Tuesday 14 December 2004 07:10 pm, Kris Kennaway wrote: > > > NB: DDB often isn't usable on SMP machines thesedays, and will hang > > > when a panic tries to enter it. > > > > Try debug.kdb.stop_cpus=0 (sysctl and tunable) to prevent KDB from trying to > > stop the other CPUs. Another possible fix that ups@ has talked about is > > changing IPI_STOP to use an NMI rather than a vector (you can send NMI IPIs > > via the local APIC) so that IPI_STOP is more reliable. > > This is already set, and it doesn't always fix the problem. debug.kdb.stop_cpus=0 should be expected to increase problems. Given time, the other CPU are quite likely to enter ddb for whatever reason the first one did. Then they stomp on ddb's global state (starting with ddb_regs). The NMI would need locking to prevent the CPUs stopping each other. > I often > get overlapping panics from the other CPUs on this machine, and it > often locks up when trying to enter DDB, or while printing the panic > string (the other day it only got as far as 'p' before hanging). panic() needs much the same locking as ddb to prevent concurrent entry. It must be fairly likely for all CPUs to panic on the same asertion. This is like all CPUs entering ddb on the same breakpoint. Bruce