From owner-cvs-all@FreeBSD.ORG Sun Jul 3 00:57:50 2005 Return-Path: X-Original-To: cvs-all@FreeBSD.org Delivered-To: cvs-all@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3D91316A5DD; Sun, 3 Jul 2005 00:56:54 +0000 (GMT) (envelope-from ps@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 613E244E2A; Sun, 3 Jul 2005 00:35:51 +0000 (GMT) (envelope-from ps@mu.org) Received: by elvis.mu.org (Postfix, from userid 1000) id 01BFC6CD89; Sat, 2 Jul 2005 17:35:31 -0700 (PDT) X-Original-To: ps@mu.org Delivered-To: ps@mu.org Received: from mx2.freebsd.org (mx2.freebsd.org [216.136.204.119]) by elvis.mu.org (Postfix) with ESMTP id 49CF15C999 for ; Thu, 16 Dec 2004 12:31:09 -0800 (PST) Received: from hub.freebsd.org (hub.freebsd.org [216.136.204.18]) by mx2.freebsd.org (Postfix) with ESMTP id 332B657B23 for ; Thu, 16 Dec 2004 20:31:09 +0000 (GMT) (envelope-from owner-src-committers@FreeBSD.org) Received: by hub.freebsd.org (Postfix) id 0A79016A555; Thu, 16 Dec 2004 20:31:04 +0000 (GMT) Delivered-To: ps@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 538) id E65D716A4D0; Thu, 16 Dec 2004 20:31:01 +0000 (GMT) Delivered-To: src-committers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 606EB16A4CE for ; Thu, 16 Dec 2004 20:31:00 +0000 (GMT) Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203]) by mx1.FreeBSD.org (Postfix) with ESMTP id CCD7643D41 for ; Thu, 16 Dec 2004 20:30:59 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6542 invoked from network); 16 Dec 2004 20:30:59 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail3.speakeasy.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 16 Dec 2004 20:30:58 -0000 Received: from [10.50.41.243] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id iBGKUhno012114; Thu, 16 Dec 2004 15:30:54 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Bruce Evans User-Agent: KMail/1.6.2 References: <200411300618.iAU6IkQX065609@repoman.freebsd.org> <20041215151526.GA3462@xor.obsecurity.org> <20041216144239.T1723@epsplex.bde.org> In-Reply-To: <20041216144239.T1723@epsplex.bde.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200412161449.23781.jhb@FreeBSD.org> Sender: owner-src-committers@FreeBSD.org Precedence: bulk X-Loop: FreeBSD.ORG X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on elvis.mu.org X-Spam-Status: No, score=-6.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1 X-Spam-Level: Cc: cvs-src@FreeBSD.org, Nate Lawson , cvs-all@FreeBSD.org, src-committers@FreeBSD.org, Kris Kennaway Subject: Re: cvs commit: src/sys/i386/i386 vm_machdep.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Sun, 03 Jul 2005 00:57:50 -0000 X-Original-Date: Thu, 16 Dec 2004 14:49:23 -0500 X-List-Received-Date: Sun, 03 Jul 2005 00:57:50 -0000 On Wednesday 15 December 2004 10:51 pm, Bruce Evans wrote: > On Wed, 15 Dec 2004, Kris Kennaway wrote: > > On Tue, Dec 14, 2004 at 09:48:48PM -0500, John Baldwin wrote: > > > On Tuesday 14 December 2004 07:10 pm, Kris Kennaway wrote: > > > > NB: DDB often isn't usable on SMP machines thesedays, and will hang > > > > when a panic tries to enter it. > > > > > > Try debug.kdb.stop_cpus=0 (sysctl and tunable) to prevent KDB from > > > trying to stop the other CPUs. Another possible fix that ups@ has > > > talked about is changing IPI_STOP to use an NMI rather than a vector > > > (you can send NMI IPIs via the local APIC) so that IPI_STOP is more > > > reliable. > > > > This is already set, and it doesn't always fix the problem. > > debug.kdb.stop_cpus=0 should be expected to increase problems. Given time, > the other CPU are quite likely to enter ddb for whatever reason the first > one did. Then they stomp on ddb's global state (starting with ddb_regs). > > The NMI would need locking to prevent the CPUs stopping each other. > > > I often > > get overlapping panics from the other CPUs on this machine, and it > > often locks up when trying to enter DDB, or while printing the panic > > string (the other day it only got as far as 'p' before hanging). > > panic() needs much the same locking as ddb to prevent concurrent entry. > It must be fairly likely for all CPUs to panic on the same asertion. > This is like all CPUs entering ddb on the same breakpoint. The thing is, panic does have locking, but it appears to be ineffective: #ifdef SMP /* * We don't want multiple CPU's to panic at the same time, so we * use panic_cpu as a simple spinlock. We have to keep checking * panic_cpu if we are spinning in case the panic on the first * CPU is canceled. */ if (panic_cpu != PCPU_GET(cpuid)) while (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0) while (panic_cpu != NOCPU) ; /* nothing */ #endif In the smpng branch in p4, I have the lock changed to be based on the thread rather than the CPU to account for problems coming from migration due to preemption while in a panic, but I haven't observed any noticeable improvement from the change: --- //depot/vendor/freebsd/src/sys/kern/kern_shutdown.c 2004/11/05 19:00:32 +++ //depot/projects/smpng/sys/kern/kern_shutdown.c 2004/11/05 19:22:55 @@ -473,7 +473,7 @@ } #ifdef SMP -static u_int panic_cpu = NOCPU; +static struct thread *panic_thread = NULL; #endif /* @@ -494,15 +494,14 @@ #ifdef SMP /* * We don't want multiple CPU's to panic at the same time, so we - * use panic_cpu as a simple spinlock. We have to keep checking - * panic_cpu if we are spinning in case the panic on the first + * use panic_thread as a simple spinlock. We have to keep checking + * panic_thread if we are spinning in case the panic on the first * CPU is canceled. */ - if (panic_cpu != PCPU_GET(cpuid)) - while (atomic_cmpset_int(&panic_cpu, NOCPU, - PCPU_GET(cpuid)) == 0) - while (panic_cpu != NOCPU) - ; /* nothing */ + if (panic_thread != curthread) + while (atomic_cmpset_ptr(&panic_thread, NULL, curthread) == 0) + while (panic_thread != NULL) + cpu_spinwait(); #endif bootopt = RB_AUTOBOOT | RB_DUMP; @@ -538,7 +537,7 @@ /* See if the user aborted the panic, in which case we continue. */ if (panicstr == NULL) { #ifdef SMP - atomic_store_rel_int(&panic_cpu, NOCPU); + atomic_store_rel_ptr(&panic_thread, NULL); #endif return; } -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org