Date: Thu, 16 Dec 2004 14:49:23 -0500 From: John Baldwin <jhb@FreeBSD.org> To: Bruce Evans <bde@zeta.org.au> Cc: Kris Kennaway <kris@obsecurity.org> Subject: Re: cvs commit: src/sys/i386/i386 vm_machdep.c Message-ID: <200412161449.23781.jhb@FreeBSD.org> In-Reply-To: <20041216144239.T1723@epsplex.bde.org> References: <200411300618.iAU6IkQX065609@repoman.freebsd.org> <20041215151526.GA3462@xor.obsecurity.org> <20041216144239.T1723@epsplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 15 December 2004 10:51 pm, Bruce Evans wrote: > On Wed, 15 Dec 2004, Kris Kennaway wrote: > > On Tue, Dec 14, 2004 at 09:48:48PM -0500, John Baldwin wrote: > > > On Tuesday 14 December 2004 07:10 pm, Kris Kennaway wrote: > > > > NB: DDB often isn't usable on SMP machines thesedays, and will hang > > > > when a panic tries to enter it. > > > > > > Try debug.kdb.stop_cpus=0 (sysctl and tunable) to prevent KDB from > > > trying to stop the other CPUs. Another possible fix that ups@ has > > > talked about is changing IPI_STOP to use an NMI rather than a vector > > > (you can send NMI IPIs via the local APIC) so that IPI_STOP is more > > > reliable. > > > > This is already set, and it doesn't always fix the problem. > > debug.kdb.stop_cpus=0 should be expected to increase problems. Given time, > the other CPU are quite likely to enter ddb for whatever reason the first > one did. Then they stomp on ddb's global state (starting with ddb_regs). > > The NMI would need locking to prevent the CPUs stopping each other. > > > I often > > get overlapping panics from the other CPUs on this machine, and it > > often locks up when trying to enter DDB, or while printing the panic > > string (the other day it only got as far as 'p' before hanging). > > panic() needs much the same locking as ddb to prevent concurrent entry. > It must be fairly likely for all CPUs to panic on the same asertion. > This is like all CPUs entering ddb on the same breakpoint. The thing is, panic does have locking, but it appears to be ineffective: #ifdef SMP /* * We don't want multiple CPU's to panic at the same time, so we * use panic_cpu as a simple spinlock. We have to keep checking * panic_cpu if we are spinning in case the panic on the first * CPU is canceled. */ if (panic_cpu != PCPU_GET(cpuid)) while (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0) while (panic_cpu != NOCPU) ; /* nothing */ #endif In the smpng branch in p4, I have the lock changed to be based on the thread rather than the CPU to account for problems coming from migration due to preemption while in a panic, but I haven't observed any noticeable improvement from the change: --- //depot/vendor/freebsd/src/sys/kern/kern_shutdown.c 2004/11/05 19:00:32 +++ //depot/projects/smpng/sys/kern/kern_shutdown.c 2004/11/05 19:22:55 @@ -473,7 +473,7 @@ } #ifdef SMP -static u_int panic_cpu = NOCPU; +static struct thread *panic_thread = NULL; #endif /* @@ -494,15 +494,14 @@ #ifdef SMP /* * We don't want multiple CPU's to panic at the same time, so we - * use panic_cpu as a simple spinlock. We have to keep checking - * panic_cpu if we are spinning in case the panic on the first + * use panic_thread as a simple spinlock. We have to keep checking + * panic_thread if we are spinning in case the panic on the first * CPU is canceled. */ - if (panic_cpu != PCPU_GET(cpuid)) - while (atomic_cmpset_int(&panic_cpu, NOCPU, - PCPU_GET(cpuid)) == 0) - while (panic_cpu != NOCPU) - ; /* nothing */ + if (panic_thread != curthread) + while (atomic_cmpset_ptr(&panic_thread, NULL, curthread) == 0) + while (panic_thread != NULL) + cpu_spinwait(); #endif bootopt = RB_AUTOBOOT | RB_DUMP; @@ -538,7 +537,7 @@ /* See if the user aborted the panic, in which case we continue. */ if (panicstr == NULL) { #ifdef SMP - atomic_store_rel_int(&panic_cpu, NOCPU); + atomic_store_rel_ptr(&panic_thread, NULL); #endif return; } -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200412161449.23781.jhb>