From owner-freebsd-net@FreeBSD.ORG  Fri Oct 15 19:46:00 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E88B9106564A;
	Fri, 15 Oct 2010 19:46:00 +0000 (UTC)
	(envelope-from rwatson@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id B5C948FC08;
	Fri, 15 Oct 2010 19:46:00 +0000 (UTC)
Received: from [192.168.2.105] (host86-161-142-69.range86-161.btcentralplus.com
	[86.161.142.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 7AE5746B32;
	Fri, 15 Oct 2010 15:45:58 -0400 (EDT)
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
From: "Robert N. M. Watson" <rwatson@freebsd.org>
In-Reply-To: <AANLkTi=uwBtd5ce5ctQJZwm+xJcNVMQfs9thOUh+uYxG@mail.gmail.com>
Date: Fri, 15 Oct 2010 20:45:55 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <93AB0F13-5995-4AAD-BEFC-A6F1317E3CA6@freebsd.org>
References: <AANLkTikA5OUYD1A9pqCqVEZ5qk+VECq8x-fnRXnpp0KE@mail.gmail.com>
	<AANLkTikau6omhWrXVM13zonFEPCxXM+8EqJauovDu0OU@mail.gmail.com>
	<alpine.BSF.2.00.1010090121310.1232@fledge.watson.org>
	<AANLkTimisSojDg2z_f1_v71evfooVdPQ44eu2Thhrf3O@mail.gmail.com>
	<C73FFD46-80B0-44F0-9A19-2B047C285134@freebsd.org>
	<AANLkTimLnRsa4v=A3Ui-1hKiVc5YLwkBND4NOmT4t+tB@mail.gmail.com>
	<15387E38-1E6C-4347-BEA1-61AEE31B5544@freebsd.org>
	<AANLkTimusir1uCE_uxS0uRQCa4rgm_+26duep3+o1XUH@mail.gmail.com>
	<alpine.BSF.2.00.1010152019450.83418@fledge.watson.org>
	<AANLkTi=uwBtd5ce5ctQJZwm+xJcNVMQfs9thOUh+uYxG@mail.gmail.com>
To: Garrett Cooper <gcooper@FreeBSD.org>
X-Mailer: Apple Mail (2.1081)
Cc: FreeBSD Current <current@freebsd.org>, freebsd-net@freebsd.org,
	Attilio Rao <attilio@freebsd.org>, Sergey Kandaurov <pluknet@freebsd.org>,
	Jack F Vogel <jfv@freebsd.org>, Ryan Stone <rstone@sandvine.com>,
	Ryan Stone <rysto32@gmail.com>, Ed Maste <emaste@sandvine.com>
Subject: Re: [PATCH] Netdump for review and testing -- preliminary version
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Oct 2010 19:46:01 -0000


On 15 Oct 2010, at 20:39, Garrett Cooper wrote:

>    But there are already some cases that aren't properly handled
> today in the ddb area dealing with dumping that aren't handled
> properly. Take for instance the following two scenarios:
> 1. Call doadump twice from the debugger.
> 2. Call doadump, exit the debugger, reenter the debugger, and call
> doadump again.
>    Both of these scenarios hang reliably for me.
>    I'm not saying that we should regress things further, but I'm just
> noting that there are most likely a chunk of edgecases that aren't
> being handled properly when doing dumps that could be handled better /
> fixed.

Right: one of the points I've made to Attilio is that we need to move to =
a more principled model as to what sorts of things we allow in various =
kernel environments. The early boot is a special environment -- so is =
the debugger, but the debugger on panic is not the same as the debugger =
when you can continue. Likewise, the crash dumping code is special, but =
also not the same as the debugger. Right now, exceptional behaviour to =
limit hangs/etc is done inconsistently. We need to develop a set of =
principles that tell us what is permitted in what contexts, and then use =
that to drive design decisions, normalizing what's there already.

This is not dissimilar to what we do with locking already, BTW: we =
define a set of kernel environments (fast interrupt handlers, =
non-sleepable threads, sleepable thread holding non-sleepable locks, =
etc), and based on those principles prevent significant sources of =
instability that might otherwise arise in a complex, concurrent kernel. =
We need to apply the same sort of approach to handling kernel debugging =
and crashing.

BTW, my view is that except in very exceptional cases, it should not be =
possible to continue after generating a dump. Dumps often cause disk =
controllers to get reset, which may leave outstanding I/O in nasty =
situations. Unless the dump device and model is known not to interfere =
with operation, we should set state indicating that the system is =
non-continuable once a dump has occurred.

Robert