Date: Sun, 9 Oct 2005 15:49:23 -0600 From: "Lonnie VanZandt" <lonnie.vanzandt@ngc.com> To: <freebsd-current@freebsd.org> Subject: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE Message-ID: <200509220742.10364.lonnie.vanzandt@ngc.com>
next in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 STABLE.
(the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v 1.5.2.2.2.1
2005/05/01 05:38:14 dwhite Exp $"); )
On Wednesday 21 September 2005 03:14 pm, Lonnie VanZandt wrote:
> Yes, the situation is greatly improved with this edit (plus a wee bit
> more). Once the target reboots, I'll send along its revised subr_kdb.c
> file. Feel free to tell me where to officially submit a bug report and
> a proposed patch.
>
> On Wednesday 21 September 2005 01:14 pm, Lonnie VanZandt wrote:
> > No, critical_enter() is _not_ SMP-safe. So, on an SMP system, a CPU
> > outside of the one currently in KDB can resume running and trap on
> > any inserted breakpoint _before_ kdb_active is decremented back to
> > 0. Et voila! mi_switch will panic.
> >
> > I'll move the decrement, rebuild the kernel, try the result, and
> > report back later what I find.
> >
> > On Wednesday 21 September 2005 01:09 pm, Lonnie VanZandt wrote:
> > > In this snippet,
> > >
> > > #ifdef SMP
> > > if (did_stop_cpus)
> > > restart_cpus(stopped_cpus);
> > > #endif
> > >
> > > kdb_active--;
> > >
> > > critical_exit();
> > >
> > > Wouldn't it be better (or even required) to move the decrement of
> > > kdb_active _before_ restarting stopped CPUs? Maybe
> > > critical_enter()/critical_exit() implies an SMP-safe region?
> > >
> > > On Wednesday 21 September 2005 01:03 pm, Lonnie VanZandt wrote:
> > > > A gtags/global search for kdb_active reports that the variable
> > > > is never explicitly set back to 0 and is only set in
> > > > kern/subr_kdb.c in kdb_trap(). There, it is incremented on entry
> > > > and decremented on exit. That seems appropriate. Maybe there is
> > > > an SMP oversight and the second CPU is triggering a trap back
> > > > into the debugger? (That would imply that the other CPU wasn't
> > > > really stopped or that there is a race condition with setting
> > > > kdb_active back to false and CPUs coming out of the stopped
> > > > state.)
> > > >
> > > > Or perhaps something is not quite right with the ddb/kdb/gdb
> > > > interactions for kdb_reenter()?
> > > >
> > > > On Wednesday 21 September 2005 12:32 pm, Lonnie VanZandt wrote:
> > > > > In 5.4 Stable, when attempting to debug kernels remotely, I
> > > > > all too frequently encounter panics within the target kernel
> > > > > as it attempts to return to the stopped thread.
> > > > >
> > > > > The panic report shows that this code is getting triggered:
> > > > >
> > > > > /*
> > > > > * Don't perform context switches from the debugger.
> > > > > */
> > > > > if (kdb_active) {
> > > > > mtx_unlock_spin(&sched_lock);
> > > > > kdb_backtrace();
> > > > > kdb_reenter();
> > > > > panic("%s: did not reenter debugger", __func__);
> > > > > }
> > > > >
> > > > > My initial guess is that somewhere kdb_active is not getting
> > > > > set back to 0/False.
> > > > >
> > > > > Is there a post-5.4 fix for this?
-------------------------------------------------------
[-- Attachment #2 --]
*** subr_kdb.c.orig Wed Sep 21 15:27:16 2005
--- /tmp/subr_kdb.c Wed Sep 21 15:23:23 2005
***************
*** 444,455 ****
makectx(tf, &kdb_pcb);
critical_enter();
! kdb_active++;
! kdb_frame = tf;
! kdb_thr_select(curthread);
!
#ifdef SMP
if ((did_stop_cpus = kdb_stop_cpus) != 0)
{
--- 444,454 ----
makectx(tf, &kdb_pcb);
+ // disable interrupts to our own CPU
critical_enter();
! // halt any other CPUs who might trip over our
! // traps and try to rush into KDB before us
#ifdef SMP
if ((did_stop_cpus = kdb_stop_cpus) != 0)
{
***************
*** 462,479 ****
}
#endif
/* Let MD code do its thing first... */
kdb_cpu_trap(type, code);
handled = kdb_dbbe->dbbe_trap(type, code);
#ifdef SMP
if (did_stop_cpus)
restart_cpus(stopped_cpus);
#endif
! kdb_active--;
!
critical_exit();
return (handled);
--- 461,497 ----
}
#endif
+ // we can't be interrupted now and we must be
+ // the only running CPU. So, if kdb_active remains 0
+ // then we have won the race to enter KDB, otherwise
+ // some other CPU beat us through the gate...
+ if ( 0 != kdb_active )
+ {
+ return 0;
+ }
+
+ // claim the prize
+ kdb_active++;
+
+ // set the kdb context
+ kdb_frame = tf;
+ kdb_thr_select( curthread );
+
/* Let MD code do its thing first... */
kdb_cpu_trap(type, code);
handled = kdb_dbbe->dbbe_trap(type, code);
+ // release the prize
+ kdb_active--;
+
+ // let other CPUs have a chance to run
#ifdef SMP
if (did_stop_cpus)
restart_cpus(stopped_cpus);
#endif
! // let our own ISRs to run
critical_exit();
return (handled);
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200509220742.10364.lonnie.vanzandt>
