From owner-cvs-all@FreeBSD.ORG Tue Jan 20 18:43:12 2004 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CEC4616A4CE; Tue, 20 Jan 2004 18:43:12 -0800 (PST) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 22B7443D2D; Tue, 20 Jan 2004 18:43:09 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86])i0L2h7td019286; Wed, 21 Jan 2004 13:43:07 +1100 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i0L2h4fe031036; Wed, 21 Jan 2004 13:43:05 +1100 Date: Wed, 21 Jan 2004 13:43:05 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Bill Paul In-Reply-To: <20040120182929.4573C16A4CF@hub.freebsd.org> Message-ID: <20040121123903.W6693@gamplex.bde.org> References: <20040120182929.4573C16A4CF@hub.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: cvs-src@FreeBSD.ORG cc: src-committers@FreeBSD.ORG cc: phk@FreeBSD.ORG cc: cvs-all@FreeBSD.ORG Subject: Re: cvs commit: src/sys/alpha/alpha support.s src/sys/i386/i386 swtch.s src/sys/kern kern_shutdown.c src/sys/sys systm.h X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jan 2004 02:43:13 -0000 On Tue, 20 Jan 2004, Bill Paul wrote: > [abuse of __FILE__ and __LINE__ in panic()] > > > Ideally a traceback should be printed too, any takers ? > > > > This can be obtained by running a debugger on the panic dump. Line > > numbers and source file names can also be printed by the debugger > > if the executable has at least line numbers in its debugging info. > > That's fine for developers who always run their systems with crash > dumps enabled and posess sufficient debugger fu to analyze them. What > about ordinary users who encounter a kernel panic due to a trap > and need to report it? This is close to the best possible method for ordinary users. I rarely use it. (I prefer to debug dynamic kernels). But to the extent that automatic crash dumps work, ordinary users always have them and you just have to extract the info from them (the users). > It would save wear and tear on everyone's > nerves (_ESPECIALLY_ mine) if they could just send in a traceback > rather than developers being forced to do the usual "go back and > do nm ${KERNEL} and tell us what you see" exchange. This is much more to ask of ordinary users. The traceback is only easy to provide if it is printed to a serial of other external console, or recorded on disk, or someone fixes backtrace() and the message buffer manages to live across panic reboots. Otherwise the user has to watch the panic and transcribe the data. This is too much to ask for, especially for verbose panic messages with lots of normally-useless data like the the trap_fatal() register dump and traces with stack frames for recursive panics (and now even line numbers for recursive panics). If the data can be recorded on disk, then panic dumps should work and already contain the data (except the implementation of backtrace() is of such quality that it doesn't write the data to the message buffer). > [...] > > > Verbose panic messages, and lots of code to print out values of variables > > just before panicing, are another mistake. Short panic messages were > > good enough when debuggers were primitive or nonexistent. Verbose panic > > messages are even less needed now. > > Ok, hang on just a minute. > > Let's be clear here. Adding linenum/sourcefile info to all panic() > calls is kind of pointless, I agree. (If you're too lazy to grep > the source for the panic message, you're just no damn good.) However, > there are times when printing the contents of some variables in > the panic string is _INCREDIBLY_ _USEFUL_. Those times are mostly when you have narrowed down the cause of a new panic and know what to print. > Terse panic messages > are annoying -- for that matter so are terse error messages in general. > Error messages are meant to a) inform the user that they need to > perform some action to correct a problem and b) alert a software > developer to a potential bug and, hopefully, how to fix it. It is > not enough to note that an error occured: you have to explain > _WHY_ it occured. Panic messages are special because the user is not expected to understand or debug them. > Really bad panic message: > > panic("mbuf is corrupt (but I'm not telling you exactly " > "what's corrupt about it because god forbid I should " > "make life easier for you)"); Not really bad panic message: /* * This message intentionally kept short so that it can be * reported easily (especially if it must be transcribed * manually). */ panic("mbuf_frobnicate: mbuf is corrupt"); > Good panic message: > > panic("mbuf is corrupt: mbuf %p has m_len of %d and m_flags of 0x%x " > "but m_data is NULL", m, m->m_len, m->m_flags); > > By showing the contents of the suspicious variable/structure, you give > a developer some clue as to what was going on in the system when the > variable/structure was trashed. Maybe m_len is the size of a protocol > header that points to a particular protocol module being suspect. > Maybe m_flags has M_DONGS set which points to code that Alfred wrote. > It may not make sense to dump out every single piece of available state, > but not providing any state at all is just damn rude. You could get lucky, but especially for memory corruption problems it is unlikely that the memory contents points to the cause of the corruption that easily. > Could you determine this from the crash dump? Sure -- but do you have > any idea how much of a pain in the ass that can be, especially when > the crash occurs on a system that isn't yours? By default, crash > dumps are as large as physical memory, and most systems now have at > least 128MB of RAM. The average user does not know how to analyze > a crash dump, which means even if they get one, they won't be able to > do anything with it, except send it to me. That means I'll have people > throwing 128MB files at me, which I can't even analyze unless I happen > to have a system handy running _exactly_ the same kernel as they are. I have defense against the large files (a 56K connection :-). You can write a script to run gdb on their crash dump (if they have one) much more easily than you can tell them how to run gdb. > This is an amazing amount of trouble to go through to determine a > couple of pieces of info that could just as easily have been printed > on the console, where the user could copy them down and include them > in an e-mail. .. if the couple of pieces of info are relevant and aren't hidden in many irrelevant lines of debugging info. > Anyway, to reiterate: I don't think the lineno/sourcefile additions > to panic() are worth the bloat. Tracebacks on the other hand, would > be worth their weight in gold bikesheds. I agree about tracebacks. We already have them, but not in a form useful to almost anyone without a serial console. Bruce