From owner-freebsd-net@FreeBSD.ORG Fri Oct 15 19:46:00 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E88B9106564A; Fri, 15 Oct 2010 19:46:00 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id B5C948FC08; Fri, 15 Oct 2010 19:46:00 +0000 (UTC) Received: from [192.168.2.105] (host86-161-142-69.range86-161.btcentralplus.com [86.161.142.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 7AE5746B32; Fri, 15 Oct 2010 15:45:58 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: "Robert N. M. Watson" In-Reply-To: Date: Fri, 15 Oct 2010 20:45:55 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <93AB0F13-5995-4AAD-BEFC-A6F1317E3CA6@freebsd.org> References: <15387E38-1E6C-4347-BEA1-61AEE31B5544@freebsd.org> To: Garrett Cooper X-Mailer: Apple Mail (2.1081) Cc: FreeBSD Current , freebsd-net@freebsd.org, Attilio Rao , Sergey Kandaurov , Jack F Vogel , Ryan Stone , Ryan Stone , Ed Maste Subject: Re: [PATCH] Netdump for review and testing -- preliminary version X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Oct 2010 19:46:01 -0000 On 15 Oct 2010, at 20:39, Garrett Cooper wrote: > But there are already some cases that aren't properly handled > today in the ddb area dealing with dumping that aren't handled > properly. Take for instance the following two scenarios: > 1. Call doadump twice from the debugger. > 2. Call doadump, exit the debugger, reenter the debugger, and call > doadump again. > Both of these scenarios hang reliably for me. > I'm not saying that we should regress things further, but I'm just > noting that there are most likely a chunk of edgecases that aren't > being handled properly when doing dumps that could be handled better / > fixed. Right: one of the points I've made to Attilio is that we need to move to = a more principled model as to what sorts of things we allow in various = kernel environments. The early boot is a special environment -- so is = the debugger, but the debugger on panic is not the same as the debugger = when you can continue. Likewise, the crash dumping code is special, but = also not the same as the debugger. Right now, exceptional behaviour to = limit hangs/etc is done inconsistently. We need to develop a set of = principles that tell us what is permitted in what contexts, and then use = that to drive design decisions, normalizing what's there already. This is not dissimilar to what we do with locking already, BTW: we = define a set of kernel environments (fast interrupt handlers, = non-sleepable threads, sleepable thread holding non-sleepable locks, = etc), and based on those principles prevent significant sources of = instability that might otherwise arise in a complex, concurrent kernel. = We need to apply the same sort of approach to handling kernel debugging = and crashing. BTW, my view is that except in very exceptional cases, it should not be = possible to continue after generating a dump. Dumps often cause disk = controllers to get reset, which may leave outstanding I/O in nasty = situations. Unless the dump device and model is known not to interfere = with operation, we should set state indicating that the system is = non-continuable once a dump has occurred. Robert