From owner-freebsd-current@FreeBSD.ORG Sat Oct 16 03:55:16 2010 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 743061065696; Sat, 16 Oct 2010 03:55:16 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id 749BB8FC15; Sat, 16 Oct 2010 03:55:15 +0000 (UTC) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o9G1X1nn025708; Sat, 16 Oct 2010 12:33:01 +1100 Received: from c122-106-146-165.carlnfd1.nsw.optusnet.com.au (c122-106-146-165.carlnfd1.nsw.optusnet.com.au [122.106.146.165]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o9G1Wt8u026565 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 16 Oct 2010 12:32:56 +1100 Date: Sat, 16 Oct 2010 12:32:55 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "Robert N. M. Watson" In-Reply-To: <93AB0F13-5995-4AAD-BEFC-A6F1317E3CA6@freebsd.org> Message-ID: <20101016114647.E63520@besplex.bde.org> References: <15387E38-1E6C-4347-BEA1-61AEE31B5544@freebsd.org> <93AB0F13-5995-4AAD-BEFC-A6F1317E3CA6@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Mon, 18 Oct 2010 02:49:44 +0000 Cc: FreeBSD Current , freebsd-net@freebsd.org, Garrett Cooper , Attilio Rao , Sergey Kandaurov , Jack F Vogel , Ryan Stone , Ryan Stone , Ed Maste Subject: Re: [PATCH] Netdump for review and testing -- preliminary version X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Oct 2010 03:55:16 -0000 On Fri, 15 Oct 2010, Robert N. M. Watson wrote: > On 15 Oct 2010, at 20:39, Garrett Cooper wrote: > >> But there are already some cases that aren't properly handled >> today in the ddb area dealing with dumping that aren't handled >> properly. Take for instance the following two scenarios: >> 1. Call doadump twice from the debugger. >> 2. Call doadump, exit the debugger, reenter the debugger, and call >> doadump again. >> Both of these scenarios hang reliably for me. >> I'm not saying that we should regress things further, but I'm just >> noting that there are most likely a chunk of edgecases that aren't >> being handled properly when doing dumps that could be handled better / >> fixed. Even thinking about calling doadump even once from within the debugger is an error. I was asleep when the similar error for panic was committed, and this error has propagated. Debuggers should use a trampoline to call the "any" function, not the least so that they can be used to debug the "any" function without the extra complications to make themself reentrant. I think gdb has always used a trampoline for this outside of the kernel. Not sure what it does within the kernel, but it would have even larger problems than in userland finding a place for the trampoline. In the kernel, there is the additional problem of keeping control while the "any" function is run. Other CPUs must be kept stopped and interrupts must be kept masked, except when the "any" function really needs other CPUs or unmasked interrupts. Single stepping also needs this and doesn't have it (other CPUs and interrupt handlers can run and execute any number of instructions while you are trying to execute a single one). All ddb "commands" that change the system state are really non-ddb commands that should use an external function via a trampoline. Panicing and dumping are just the largest ones, so they are the most impossible to do correctly as commands and the most in need of ddb to debug them. > Right: one of the points I've made to Attilio is that we need to move to a more principled model as to what sorts of things we allow in various kernel environments. The early boot is a special environment -- so is the debugger, but the debugger on panic is not the same as the debugger when you can continue. Likewise, the crash dumping code is special, but also not the same as the debugger. Right now, exceptional behaviour to limit hangs/etc is done inconsistently. We need to develop a set of principles that tell us what is permitted in what contexts, and then use that to drive design decisions, normalizing what's there already. ENONUNIXEDITOR. Format not recovered. panic() from within a debugger (or a fast interrupt handler, or a fast interrupt handler that has trappeded to the debugger by request...) is, although an error, not too bad since panic() must be prepared to work starting from the "any" state anyway, and as you mention it doesn'tneed to be able to return (except for RESTARTABLE_PANICS, which makes things impossibly difficult). Continuing from a debugger is feasible mainly because in the usual case the system state is not changed (except for time-dependent things). If you use it to modify memory or i/o or run one of its unsafe commands then you have to be careful. > This is not dissimilar to what we do with locking already, BTW: we define a set of kernel environments (fast interrupt handlers, non-sleepable threads, sleepable thread holding non-sleepable locks, etc), and based on those principles prevent significant sources of instability that might otherwise arise in a complex, concurrent kernel. We need to apply the same sort of approach to handling kernel debugging and crashing. Locking has imposed considerable discipline, which if followed by panic() would should how wrong most of the things done by panic() are -- it will hit locks, but shouldn't even be calling functions that have locks, since such functions expect their locks to work. The rules for fast interrupt handlers are simple and mostly not followed. They are that a fast interrupt handler may not access any state not specially locked by its subsystem. This means that they may not call any other subsystem or any upper layer except the null set of ones documented to be safe to call. In practice, this means not calling the "any" function, but it is necessary for atomic ops, bus space accesses, and a couple of scheduling functions to be safe enough. > BTW, my view is that except in very exceptional cases, it should not be possible to continue after generating a dump. Dumps often cause disk controllers to get reset, which may leave outstanding I/O in nasty situations. Unless the dump device and model is known not to interfere with operation, we should set state indicating that the system is non-continuable once a dump has occurred. It might be safe if the system reinitialized everything. Too hard for just dumping, but it is needed after resume anyway. So the following could reasonably work: - stop system while its state is believed to be good - dump - restart/resume. The order for this is unclear. Normal resume might want the system to have not stopped as much as it needed to for dumping. Bruce