Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Oct 1998 23:02:23 -0700
From:      Mike Smith <mike@smith.net.au>
To:        cgd@netbsd.org (Chris G. Demetriou)
Cc:        dg@root.com, Jason Thorpe <thorpej@nas.nasa.gov>, Andrew Gallatin <gallatin@cs.duke.edu>, Chris Csanady <ccsanady@friley-185-114.res.iastate.edu>, freebsd-alpha@FreeBSD.ORG
Subject:   Re: kernel traps on boot.. 
Message-ID:  <199810160602.XAA00878@dingo.cdrom.com>
In-Reply-To: Your message of "14 Oct 1998 18:59:18 PDT." <8767dmoaa1.fsf@netbsd1.cygnus.com> 

next in thread | previous in thread | raw e-mail | index | archive | help

It would probably be fair to say that this neatly encapsulates the
philosphical differences between FreeBSD-current and NetBSD-current, and
it's not surprising that there's some confusion between the two groups.

> David Greenman <dg@root.com> writes:
> > >Just doing printfs for broken kernel code only encourages laziness.
> > 
> >    Well, that might be fine for a developer, but it sure doesn't help end
> > users. We *are* trying to provide a production system after all. :-)
> 
> If code is sufficiently untested that it randomly runs into unaligned
> accesses, then by definition, it isn't a production-quality system and
> you don't need to worry about panic()ing.
> 
> However, if it _is_ well tested, "production quality," and still runs
> into that unaligned access, then that unaligned access is probably
> indicative of a somewhat-serious bug.  It means either that code is
> getting a bogus value because of specification/implementation "issue,"
> or that something, somewhere got corrupted, and therefore the system
> lost.
> 
> To have such bugs fixed properly, in many cases, a developer will need
> to know more about the context in which it occurred than just the fact
> that it occurred, the PC, and a few registers.  That means panic,
> followed by kernel core dump (or invocation of kernel debugger, or
> whatever), which then gets handed by the user of the production system
> to a developer, who debugs it.

FreeBSD policy for new code, is to commit early and fix fast.  Because
most developers track -current very aggressively, committing code which
causes "diagnostic panics" is not a popular option.  If the code was on
a reasonably common path, it would prevent developers working on
unrelated issues from doing anything useful until the problem was
resolved (and possibly slow the adoption of the resolution).  This
places the development cycle somewhat in lockstep, where only one
misfeature can be resolved at a time.

Instead, FreeBSD developers tend to be a talkative bunch, and the 
existence of a "diagnostic printf" will cause those seeing it to pipe 
up and identify themselves to the owner of the code in question, 
allowing said developer to immediately interact with users having 
suitable test environments for reproducing the problem without locking 
everyone else out.

> In my opinion, it's not only bad, but _irresponsible_ to let the
> system bumble on in the face of such a bug.  High uptime is nice, but
> if it comes at the cost of ignoring serious system errors or
> corrupting data, it's worthless.

I don't think anyone would disagree with you here.  However an unaligned
access doesn't fit into this case, as you can handle it cleanly (while
tagging the problem as an error) without crying wolf.

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199810160602.XAA00878>