From owner-freebsd-current@FreeBSD.ORG Thu Nov 3 19:31:58 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 047A516A43E; Thu, 3 Nov 2005 19:31:58 +0000 (GMT) (envelope-from Lonnie.Vanzandt@ngc.com) Received: from xcgmd812.northgrum.com (xcgmd812.northgrum.com [155.104.240.108]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3615743D46; Thu, 3 Nov 2005 19:31:56 +0000 (GMT) (envelope-from Lonnie.Vanzandt@ngc.com) Received: from xbhm0001.northgrum.com ([155.104.118.90]) by xcgmd812.northgrum.com with InterScan Messaging Security Suite; Thu, 03 Nov 2005 11:32:52 -0800 Received: from xcgco501.northgrum.com ([158.114.104.53]) by xbhm0001.northgrum.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 3 Nov 2005 11:31:54 -0800 Received: from [192.168.170.128] ([158.114.106.12]) by xcgco501.northgrum.com with Microsoft SMTPSVC(5.0.2195.6713); Thu, 3 Nov 2005 12:30:33 -0700 From: Lonnie VanZandt Organization: Northrop Grumman To: John Baldwin Date: Thu, 3 Nov 2005 12:29:52 -0700 User-Agent: KMail/1.8.1 References: <200509220742.10364.lonnie.vanzandt@ngc.com> <200511031327.18011.jhb@freebsd.org> In-Reply-To: <200511031327.18011.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200511031229.53501.lonnie.vanzandt@ngc.com> X-OriginalArrivalTime: 03 Nov 2005 19:30:33.0452 (UTC) FILETIME=[0EF04AC0:01C5E0AD] Cc: freebsd-current@freebsd.org, marcel@freebsd.org Subject: Re: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lonnie.vanzandt@ngc.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Nov 2005 19:31:58 -0000 I think I follow the proposal. Sure, I'll apply your patch and run with it on my SMP box. It may take a while to reach a conclusion on its merits due to the racy nature of the crash. On Thursday 03 November 2005 11:27 am, John Baldwin wrote: > On Sunday 09 October 2005 05:49 pm, Lonnie VanZandt wrote: > > Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 STABLE. > > (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v 1.5.2.2.2.1 > > 2005/05/01 05:38:14 dwhite Exp $"); ) > > I've looked at this, but I think t could maybe be done slightly > differently. Here's a suggested patch that would close the race you are > seeing I think while allowing semantics such that if two CPUs try to enter > KDB at the same time, they would serialize and the second CPU would enter > kdb after the first had exited. Could you at least test it to see if it > addresses your race condition? > > --- //depot/projects/smpng/sys/kern/subr_kdb.c 2005/10/27 19:51:50 > +++ //depot/user/jhb/ktrace/kern/subr_kdb.c 2005/11/03 18:24:38 > @@ -39,6 +39,7 @@ > #include > #include > > +#include > #include > #include > > @@ -462,12 +463,21 @@ > return (0); > > /* We reenter the debugger through kdb_reenter(). */ > - if (kdb_active) > + if (kdb_active == PCPU_GET(cpuid) + 1) > return (0); > > critical_enter(); > > - kdb_active++; > + /* > + * If more than one CPU tries to enter KDB at the same time > + * then force them to serialize and go one at a time. > + */ > + while (!atomic_cmpset_int(&kdb_active, 0, PCPU_GET(cpuid) + 1)) { > + critical_exit(); > + while (kdb_active) > + cpu_spinwait(); > + critical_enter(); > + } > > #ifdef SMP > if ((did_stop_cpus = kdb_stop_cpus) != 0) > @@ -484,13 +494,17 @@ > > handled = kdb_dbbe->dbbe_trap(type, code); > > + /* > + * We have to exit KDB before resuming the other CPUs so that they > + * may run in a debugger-less context. > + */ > + kdb_active = 0; > + > #ifdef SMP > if (did_stop_cpus) > restart_cpus(stopped_cpus); > #endif > > - kdb_active--; > - > critical_exit(); > > return (handled);