From owner-freebsd-current@FreeBSD.ORG  Sat Oct 16 03:55:16 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 743061065696;
	Sat, 16 Oct 2010 03:55:16 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx09.syd.optusnet.com.au
	(fallbackmx09.syd.optusnet.com.au [211.29.132.242])
	by mx1.freebsd.org (Postfix) with ESMTP id 749BB8FC15;
	Sat, 16 Oct 2010 03:55:15 +0000 (UTC)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o9G1X1nn025708; Sat, 16 Oct 2010 12:33:01 +1100
Received: from c122-106-146-165.carlnfd1.nsw.optusnet.com.au
	(c122-106-146-165.carlnfd1.nsw.optusnet.com.au [122.106.146.165])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o9G1Wt8u026565
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 16 Oct 2010 12:32:56 +1100
Date: Sat, 16 Oct 2010 12:32:55 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: "Robert N. M. Watson" <rwatson@freebsd.org>
In-Reply-To: <93AB0F13-5995-4AAD-BEFC-A6F1317E3CA6@freebsd.org>
Message-ID: <20101016114647.E63520@besplex.bde.org>
References: <AANLkTikA5OUYD1A9pqCqVEZ5qk+VECq8x-fnRXnpp0KE@mail.gmail.com>
	<AANLkTikau6omhWrXVM13zonFEPCxXM+8EqJauovDu0OU@mail.gmail.com>
	<alpine.BSF.2.00.1010090121310.1232@fledge.watson.org>
	<AANLkTimisSojDg2z_f1_v71evfooVdPQ44eu2Thhrf3O@mail.gmail.com>
	<C73FFD46-80B0-44F0-9A19-2B047C285134@freebsd.org>
	<AANLkTimLnRsa4v=A3Ui-1hKiVc5YLwkBND4NOmT4t+tB@mail.gmail.com>
	<15387E38-1E6C-4347-BEA1-61AEE31B5544@freebsd.org>
	<AANLkTimusir1uCE_uxS0uRQCa4rgm_+26duep3+o1XUH@mail.gmail.com>
	<alpine.BSF.2.00.1010152019450.83418@fledge.watson.org>
	<AANLkTi=uwBtd5ce5ctQJZwm+xJcNVMQfs9thOUh+uYxG@mail.gmail.com>
	<93AB0F13-5995-4AAD-BEFC-A6F1317E3CA6@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Mon, 18 Oct 2010 02:49:44 +0000
Cc: FreeBSD Current <current@freebsd.org>, freebsd-net@freebsd.org,
	Garrett Cooper <gcooper@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Sergey Kandaurov <pluknet@freebsd.org>,
	Jack F Vogel <jfv@freebsd.org>, Ryan Stone <rstone@sandvine.com>,
	Ryan Stone <rysto32@gmail.com>, Ed Maste <emaste@sandvine.com>
Subject: Re: [PATCH] Netdump for review and testing -- preliminary version
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Oct 2010 03:55:16 -0000

On Fri, 15 Oct 2010, Robert N. M. Watson wrote:

> On 15 Oct 2010, at 20:39, Garrett Cooper wrote:
>
>>    But there are already some cases that aren't properly handled
>> today in the ddb area dealing with dumping that aren't handled
>> properly. Take for instance the following two scenarios:
>> 1. Call doadump twice from the debugger.
>> 2. Call doadump, exit the debugger, reenter the debugger, and call
>> doadump again.
>>    Both of these scenarios hang reliably for me.
>>    I'm not saying that we should regress things further, but I'm just
>> noting that there are most likely a chunk of edgecases that aren't
>> being handled properly when doing dumps that could be handled better /
>> fixed.

Even thinking about calling doadump even once from within the debugger is
an error.  I was asleep when the similar error for panic was committed,
and this error has propagated.  Debuggers should use a trampoline to
call the "any" function, not the least so that they can be used to debug
the "any" function without the extra complications to make themself
reentrant.  I think gdb has always used a trampoline for this outside of
the kernel.  Not sure what it does within the kernel, but it would have
even larger problems than in userland finding a place for the trampoline.
In the kernel, there is the additional problem of keeping control while
the "any" function is run.  Other CPUs must be kept stopped and interrupts
must be kept masked, except when the "any" function really needs other CPUs
or unmasked interrupts.  Single stepping also needs this and doesn't have
it (other CPUs and interrupt handlers can run and execute any number of
instructions while you are trying to execute a single one).  All ddb
"commands" that change the system state are really non-ddb commands that
should use an external function via a trampoline.  Panicing and dumping
are just the largest ones, so they are the most impossible to do correctly
as commands and the most in need of ddb to debug them.

> Right: one of the points I've made to Attilio is that we need to move to a more principled model as to what sorts of things we allow in various kernel environments. The early boot is a special environment -- so is the debugger, but the debugger on panic is not the same as the debugger when you can continue. Likewise, the crash dumping code is special, but also not the same as the debugger. Right now, exceptional behaviour to limit hangs/etc is done inconsistently. We need to develop a set of principles that tell us what is permitted in what contexts, and then use that to drive design decisions, normalizing what's there already.

ENONUNIXEDITOR.  Format not recovered.

panic() from within a debugger (or a fast interrupt handler, or a fast
interrupt handler that has trappeded to the debugger by request...) is,
although an error, not too bad since panic() must be prepared to work
starting from the "any" state anyway, and as you mention it doesn'tneed
to be able to return (except for RESTARTABLE_PANICS, which makes things
impossibly difficult).  Continuing from a debugger is feasible mainly
because in the usual case the system state is not changed (except for
time-dependent things).  If you use it to modify memory or i/o or run
one of its unsafe commands then you have to be careful.

> This is not dissimilar to what we do with locking already, BTW: we define a set of kernel environments (fast interrupt handlers, non-sleepable threads, sleepable thread holding non-sleepable locks, etc), and based on those principles prevent significant sources of instability that might otherwise arise in a complex, concurrent kernel. We need to apply the same sort of approach to handling kernel debugging and crashing.

Locking has imposed considerable discipline, which if followed by panic()
would should how wrong most of the things done by panic() are -- it will
hit locks, but shouldn't even be calling functions that have locks, since
such functions expect their locks to work.

The rules for fast interrupt handlers are simple and mostly not followed.
They are that a fast interrupt handler may not access any state not
specially locked by its subsystem.  This means that they may not call
any other subsystem or any upper layer except the null set of ones
documented to be safe to call.  In practice, this means not calling the
"any" function, but it is necessary for atomic ops, bus space accesses,
and a couple of scheduling functions to be safe enough.

> BTW, my view is that except in very exceptional cases, it should not be possible to continue after generating a dump. Dumps often cause disk controllers to get reset, which may leave outstanding I/O in nasty situations. Unless the dump device and model is known not to interfere with operation, we should set state indicating that the system is non-continuable once a dump has occurred.

It might be safe if the system reinitialized everything.  Too hard for just
dumping, but it is needed after resume anyway.  So the following could
reasonably work:
- stop system while its state is believed to be good
- dump
- restart/resume.  The order for this is unclear.  Normal resume might want
   the system to have not stopped as much as it needed to for dumping.

Bruce