From owner-cvs-src@FreeBSD.ORG  Wed Jan 21 01:45:39 2004
Return-Path: <owner-cvs-src@FreeBSD.ORG>
Delivered-To: cvs-src@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2F89716A4CE; Wed, 21 Jan 2004 01:45:39 -0800 (PST)
Received: from mx.nsu.ru (mx.nsu.ru [212.192.164.5])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0AE6443D2D; Wed, 21 Jan 2004 01:45:36 -0800 (PST)
	(envelope-from danfe@regency.nsu.ru)
Received: from mail by mx.nsu.ru with drweb-scanned (Exim 3.35 #1 (Debian))
	id 1AjF5n-0007hm-00; Wed, 21 Jan 2004 15:56:11 +0600
Received: from regency.nsu.ru ([193.124.210.26])
	by mx.nsu.ru with esmtp (Exim 3.35 #1 (Debian))
	id 1AjF5k-0007gS-00; Wed, 21 Jan 2004 15:56:08 +0600
Received: from regency.nsu.ru (localhost [127.0.0.1])
	by regency.nsu.ru (8.12.9p2/8.12.9) with ESMTP id i0L9kVDO080313;
	Wed, 21 Jan 2004 15:46:31 +0600 (NOVT)
	(envelope-from danfe@regency.nsu.ru)
Received: (from danfe@localhost)
	by regency.nsu.ru (8.12.9p2/8.12.9/Submit) id i0L9kRk1080135;
	Wed, 21 Jan 2004 15:46:27 +0600 (NOVT)
	(envelope-from danfe)
Date: Wed, 21 Jan 2004 15:46:27 +0600
From: Alexey Dokuchaev <danfe@nsu.ru>
To: Bill Paul <wpaul@freebsd.org>
Message-ID: <20040121094627.GA44278@regency.nsu.ru>
References: <20040120175334.W3279@gamplex.bde.org>
	<20040120182929.4573C16A4CF@hub.freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040120182929.4573C16A4CF@hub.freebsd.org>
User-Agent: Mutt/1.4.1i
X-Envelope-To: wpaul@freebsd.org,
 bde@zeta.org.au,
 cvs-src@freebsd.org,
 src-committers@freebsd.org,
 phk@freebsd.org,
 cvs-all@freebsd.org
cc: cvs-src@freebsd.org
cc: cvs-all@freebsd.org
cc: src-committers@freebsd.org
cc: phk@freebsd.org
cc: Bruce Evans <bde@zeta.org.au>
Subject: Re: cvs commit: src/sys/alpha/alpha support.s src/sys/i386/i386
	swtch.s src/sys/kern kern_shutdown.c src/sys/sys systm.h
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Jan 2004 09:45:39 -0000

On Tue, Jan 20, 2004 at 10:29:29AM -0800, Bill Paul wrote:
> 
> [abuse of __FILE__ and __LINE__ in panic()]
>  
> > This was rejected in all reviews.  It gives less information than
> > grepping the sources, at some cost (grep at least gives correct line
> > numbers when the sources don't quite match the binary).
> > 
> > Please back this out.
> 
> I have to second this.
>  
> > >   Ideally a traceback should be printed too, any takers ?
> > 
> > This can be obtained by running a debugger on the panic dump.  Line
> > numbers and source file names can also be printed by the debugger
> > if the executable has at least line numbers in its debugging info.
> 
> That's fine for developers who always run their systems with crash
> dumps enabled and posess sufficient debugger fu to analyze them. What
> about ordinary users who encounter a kernel panic due to a trap
> and need to report it? It would save wear and tear on everyone's
> nerves (_ESPECIALLY_ mine) if they could just send in a traceback
> rather than developers being forced to do the usual "go back and
> do nm ${KERNEL} and tell us what you see" exchange.

It seems that instead of _TRYING_ to simplify everyone's life and being
smarter than our users, we need to improve on our users' general
bug/panic reporting "culture".  Probably some in-depth section in the
Handbook would be useful.

> 
> [...]
> 
> > Verbose panic messages, and lots of code to print out values of variables
> > just before panicing, are another mistake.  Short panic messages were
> > good enough when debuggers were primitive or nonexistent.  Verbose panic
> > messages are even less needed now.
> 
> Ok, hang on just a minute.
> 
> Let's be clear here. Adding linenum/sourcefile info to all panic()
> calls is kind of pointless, I agree. (If you're too lazy to grep
> the source for the panic message, you're just no damn good.) However,

Agreed.

> there are times when printing the contents of some variables in
> the panic string is _INCREDIBLY_ _USEFUL_. Terse panic messages
> are annoying -- for that matter so are terse error messages in general.
> Error messages are meant to a) inform the user that they need to
> perform some action to correct a problem and b) alert a software
> developer to a potential bug and, hopefully, how to fix it. It is
> not enough to note that an error occured: you have to explain
> _WHY_ it occured.
> 
> Really bad panic message:
> 
> 	panic("mbuf is corrupt (but I'm not telling you exactly "
> 	    "what's corrupt about it because god forbid I should "
> 	    "make life easier for you)");
> 
> Good panic message:
> 
> 	panic("mbuf is corrupt: mbuf %p has m_len of %d and m_flags of 0x%x "
> 	    "but m_data is NULL", m, m->m_len, m->m_flags);
> 
> By showing the contents of the suspicious variable/structure, you give
> a developer some clue as to what was going on in the system when the 
> variable/structure was trashed. Maybe m_len is the size of a protocol
> header that points to a particular protocol module being suspect.
> Maybe m_flags has M_DONGS set which points to code that Alfred wrote.
> It may not make sense to dump out every single piece of available state,
> but not providing any state at all is just damn rude.
> 
> Could you determine this from the crash dump? Sure -- but do you have
> any idea how much of a pain in the ass that can be, especially when
> the crash occurs on a system that isn't yours? By default, crash
> dumps are as large as physical memory, and most systems now have at
> least 128MB of RAM. The average user does not know how to analyze
> a crash dump, which means even if they get one, they won't be able to
> do anything with it, except send it to me. That means I'll have people
> throwing 128MB files at me, which I can't even analyze unless I happen
> to have a system handy running _exactly_ the same kernel as they are.
> This is an amazing amount of trouble to go through to determine a
> couple of pieces of info that could just as easily have been printed
> on the console, where the user could copy them down and include them
> in an e-mail.
> 
> Note that I am not saying we should immediately go back throug the
> kernel source and verbosify every single panic() call we come across.
> I _am_ saying that developers should not use existing panic() messages
> as models for their own. I am also not going to let you lead them
> down that path, for that way lies Linux^Wmadness.
> 
> Anyway, to reiterate: I don't think the lineno/sourcefile additions
> to panic() are worth the bloat. Tracebacks on the other hand, would
> be worth their weight in gold bikesheds.

Exactly right.  Methinks that lineno/sourcefile is rather easily
obtainable, but the state of a disaster is a lot harder nut to crack,
unaided.

./danfe