From owner-freebsd-arch@FreeBSD.ORG Tue Dec 18 12:10:47 2007 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4E6B16A41A; Tue, 18 Dec 2007 12:10:46 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8C59513C4CC; Tue, 18 Dec 2007 12:10:46 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 4BC9347E89; Tue, 18 Dec 2007 07:10:46 -0500 (EST) Date: Tue, 18 Dec 2007 12:10:46 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: arch@FreeBSD.org Message-ID: <20071218120359.E15521@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: current@FreeBSD.org Subject: DDB scripting, output capture, and textdumps X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 12:10:47 -0000 Dear all: I've been hacking on-and-off for a while on a side project to improve our kernel debugging facilities. Primarily, my concern has been to address three problems: - The complications of employing kernel core dumps for debugging, including the large size of dumps making them unwieldy to distribute or store for any extended period (even with minidumps), the requirement to have relatively synchronized kernel source in order to use the dumps, the need to have a kernel with debugging symbols, and the problems with fsck causing sufficient swap use to invalidate dumps before they can be extracted. - The decreasing likelihood that notebooks will ship with serial ports that can be used for interactive debugging using DDB. Making end-users type in stack traces is cruel, photos are a pain, and X11 rules out both. - The fact that a great many problems are most easily diagnosed using utility routines present in DDB, but not as easily using kgdb for offline analysis. I find that for many bugs I analyze, simply looking at the DDB output is sufficient to identify the source of the problem. An idea I punted around a bit at BSDCan earlier this year (or perhaps it was at EuroBSDCon the previous year) was an idea of a "textdump" -- that is, a new type of kernel dump based on capturing automatically extracted debugging information generated by DDB. The result would be an ASCII text file that could be filed as a bug report, perhaps even automatically. To this end, I have implemented three new facilities for use with DDB: (1) DDB output capture. The output of DDB is stored in a memory buffer, and can be extracted using a sysctl or textdumps (see below). This can be turned on and off, both for use manually ("I'll want this later, but not that") and as part of scripts (see below). (2) DDB scripting. A limited number of named scripts can be defined to run a series of DDB commands. No loops, etc, just simple command lists. These can be caused to run automatically on entering DDB for various scenarios, including WITNESS violations and kernel panics. They can also be run by hand in order to save a bit of typing if you use DDB in a repetitive way (as I do). (3) Textdumps. A new dump type that stores a series of data files containing various pieces of information, including the DDB capture buffer, kernel message buffer, kernel configuration (if compiled into the kernel), panic message, and kernel version string. These are stored in the ustar format inside the dump partition (aligned to the end) so can be easily extended, and savecore(8) requires almost no new logic to deal with them (it just drops numbered tar files in /var/crash). This makes it straight forward to extend the textdump format to include new types of information and avoids the issue of how to safely simultaneously represent information in many different formats in the same file. These are pretty flexible tools, and you can imagine doing the following sorts of things: - Setting the kdb.enter.panic script to automatically turn on output capture, do full backtraces of all threads, show open file information, dump UMA stats, and save it all to a textdump and then reboot. - Setting the kdb.enter.witness script to show lock information, generate a coredump, and reboot. Or, just to automatically do "show allocks" and drop to the DDB prompt. - Adding a flag to rc.conf to automatically submit textdumps via e-mail to a specific address, perhaps including GNATS or an automated bug system. These could be unpacked and automatically analyzed, and do to the compact size, kept for long-term trend analysis or to identify when a problem started occuring. I've produced an initial snapshot of the above, which can be found here: http://www.watson.org/~robert/freebsd/20071218-ddb.tgz This adds three files to DDB, patches quite a few kernel files (to pass more information into KDB about why it's being entered, in order to trigger the right script), enhancements to savecore(8) to know how to extract textdumps, adds a ddb(8) command line tool so that userspace can manage DDB scripts from outside the debugger, extensions to the ddb(4) man page, and a new textdump(4) man page. There are a number of known limitations; I've tried to document them at the top of the pertinent files where I am aware of them. I also regret to say that to date I've been able to test only on i386, and not other platforms. I'd welcome any feedback -- I'd like to get these changes into CVS in the next week or two. Robert N M Watson Computer Laboratory University of Cambridge