Date: Thu, 3 Nov 2005 12:29:52 -0700 From: Lonnie VanZandt <lonnie.vanzandt@ngc.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-current@freebsd.org, marcel@freebsd.org Subject: Re: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE Message-ID: <200511031229.53501.lonnie.vanzandt@ngc.com> In-Reply-To: <200511031327.18011.jhb@freebsd.org> References: <200509220742.10364.lonnie.vanzandt@ngc.com> <200511031327.18011.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
I think I follow the proposal. Sure, I'll apply your patch and run with it on my SMP box. It may take a while to reach a conclusion on its merits due to the racy nature of the crash. On Thursday 03 November 2005 11:27 am, John Baldwin wrote: > On Sunday 09 October 2005 05:49 pm, Lonnie VanZandt wrote: > > Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 STABLE. > > (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v 1.5.2.2.2.1 > > 2005/05/01 05:38:14 dwhite Exp $"); ) > > I've looked at this, but I think t could maybe be done slightly > differently. Here's a suggested patch that would close the race you are > seeing I think while allowing semantics such that if two CPUs try to enter > KDB at the same time, they would serialize and the second CPU would enter > kdb after the first had exited. Could you at least test it to see if it > addresses your race condition? > > --- //depot/projects/smpng/sys/kern/subr_kdb.c 2005/10/27 19:51:50 > +++ //depot/user/jhb/ktrace/kern/subr_kdb.c 2005/11/03 18:24:38 > @@ -39,6 +39,7 @@ > #include <sys/smp.h> > #include <sys/sysctl.h> > > +#include <machine/cpu.h> > #include <machine/kdb.h> > #include <machine/pcb.h> > > @@ -462,12 +463,21 @@ > return (0); > > /* We reenter the debugger through kdb_reenter(). */ > - if (kdb_active) > + if (kdb_active == PCPU_GET(cpuid) + 1) > return (0); > > critical_enter(); > > - kdb_active++; > + /* > + * If more than one CPU tries to enter KDB at the same time > + * then force them to serialize and go one at a time. > + */ > + while (!atomic_cmpset_int(&kdb_active, 0, PCPU_GET(cpuid) + 1)) { > + critical_exit(); > + while (kdb_active) > + cpu_spinwait(); > + critical_enter(); > + } > > #ifdef SMP > if ((did_stop_cpus = kdb_stop_cpus) != 0) > @@ -484,13 +494,17 @@ > > handled = kdb_dbbe->dbbe_trap(type, code); > > + /* > + * We have to exit KDB before resuming the other CPUs so that they > + * may run in a debugger-less context. > + */ > + kdb_active = 0; > + > #ifdef SMP > if (did_stop_cpus) > restart_cpus(stopped_cpus); > #endif > > - kdb_active--; > - > critical_exit(); > > return (handled);
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511031229.53501.lonnie.vanzandt>