From owner-freebsd-current@FreeBSD.ORG Thu Apr 8 08:51:47 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 418F616A4CE for ; Thu, 8 Apr 2004 08:51:47 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id C1BFD43D3F for ; Thu, 8 Apr 2004 08:51:46 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.10/8.12.10) with ESMTP id i38FpPPq048074; Thu, 8 Apr 2004 11:51:25 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i38FpOj3048071; Thu, 8 Apr 2004 11:51:24 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Thu, 8 Apr 2004 11:51:24 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Marcel Moolenaar In-Reply-To: <20040408154004.GA22500@dhcp01.pn.xcllnt.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@freebsd.org Subject: Re: panic on one cpu leaves others running... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Apr 2004 15:51:47 -0000 On Thu, 8 Apr 2004, Marcel Moolenaar wrote: > On Thu, Apr 08, 2004 at 09:43:06AM -0400, Robert Watson wrote: > > > > On Wed, 7 Apr 2004, Marcel Moolenaar wrote: > > > > > On Thu, Apr 08, 2004 at 12:13:39AM -0400, Robert Watson wrote: > > > > > > > > Funky, eh? I thought we used to have code to ipi the other cpu's and halt > > > > them until the cpu in ddb was out agian. I guess I mis-remember, or that > > > > code is broken... > > > > > > You remember correctly. > > > > And it's still going this morning: > *snip* > > Apr 8 13:39:30 sm-mta[4707]: i3879Tjc003922: SYSERR(root): cannot > > flock(/etc/mail/aliases, fd=5, type=1, omode=40000, euid=0): Operation not > > supported > > > > Debugger(c07c3990) at Debugger+0x46 > > db> > > Do you have SMP and/or made modifications to ? > What's pcpu->pc_other_cpus and what is stopped_cpus currently? No changes to smptests.h. Unfortunately, I don't have access to serial gdb for this box, and causing a dump might well change all that, so I only have the value of stopped_cpus: db> print stopped_cpus c0950594 db> print *stopped_cpus d > > Presumably in large part because I'm in code that doesn't require Giant, > > so there are no lock conflicts. > > I don't think that's the case. It think we're just not stopping the CPUs > or keep them stopped. I agree with that interpretation -- I was suggesting that the reason this problem might not be noticed is that a lot of our code paths require Giant, and it's only when you panic in code without Giant that > > This is all a hunch and I have no way to test this myself... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research