Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Dec 2007 07:26:01 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        amd64@freebsd.org
Subject:   Re: Can't panic from debugger
Message-ID:  <20071224065516.K4239@delplex.bde.org>
In-Reply-To: <20071223125714.K79882@fledge.watson.org>
References:  <20071223125236.GM1616@droso.net> <20071223125714.K79882@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 23 Dec 2007, Robert Watson wrote:

> On Sun, 23 Dec 2007, Erwin Lansing wrote:
>
>> The amd64 nodes in the pointyhat cluster are starting to behave quite 
>> interestingly.  They stop to respond to ssh, but are still answering ping. 
>> More worrying is that I cannot get a useful dump out of it, as a panic from 
>> the debugger just hangs there, and all I am left with is to pull the plug. 
>> This even happens on a normal working system after entering the debugger, 
>> of which there is a typescript below.
>> ...
> I discovered yesterday that I was seeing the same problem on a dual-cpu, 
> dual-core box in the netperf cluster:

This is as expected.  Debugger context is special, and no non-debugger
functions can be called from it without going through the (unimplemented)
trampoline needed to temporarily leave it.  Some non-debugger functions
may work accidentally or appear to work when called directly.  panic()
is not one of these, since it tends to trip over a lock.  panic()
called from an arbitrary context has the same problem, since the calling
context may hold a lock that is used by panic().  Debugger context
always holds the pseudo-spinlock of masked CPU interrupts and stopped
other CPUs.  panic() (actually boot()) normally begins with a normal
sync() call that is not aware that it may be called in either panic
or debugger context and depends on a large amount of system code working
normally.  It cannot legitimately sync anything when called in debugger
context, since syncing requires i/o and i/o normally requires interrupts.

> I *can* get a coredump if I directly "call doadump" and then "reset", but I 
> can't get one if I just do "panic".

Dumps have some chance of working since they are required to try harder
than sync() to work in any context.  In particular, they are or were
aware that they are not permitted to use interrupts.  Reset has a
better chance of working since it is simpler and reset is a legitimate
debugger command.  I was asleep when the panic debugger command was
added.  Interrupt-driven I/O in sync for panics back then tended to
work bogusly by blowing away both spl*() masks and the hard CPU interrupt
disable mask.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071224065516.K4239>