Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Dec 2012 17:04:17 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Alfred Perlstein <bright@mu.org>
Cc:        Adrian Chadd <adrian@freebsd.org>, src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, svn-src-all@freebsd.org, Alfred Perlstein <alfred@freebsd.org>, Andriy Gapon <avg@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org>, Robert Watson <rwatson@freebsd.org>, Navdeep Parhar <np@freebsd.org>, svn-src-head@freebsd.org
Subject:   Re: svn commit: r244112 - head/sys/kern
Message-ID:  <20121215161414.V1029@besplex.bde.org>
In-Reply-To: <50CBC285.7060307@mu.org>
References:  <201212110708.qBB78EWx025288@svn.freebsd.org> <50C9271C.70803@mu.org> <20121213090215.GP97487@FreeBSD.org> <201212141149.07671.jhb@freebsd.org> <alpine.BSF.2.00.1212150010160.54345@fledge.watson.org> <50CBC285.7060307@mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 14 Dec 2012, Alfred Perlstein wrote:

> On 12/14/12 4:12 PM, Robert Watson wrote:
>> On Fri, 14 Dec 2012, John Baldwin wrote:
>> 
>>> On Thursday, December 13, 2012 4:02:15 am Gleb Smirnoff wrote:
>>>> On Wed, Dec 12, 2012 at 04:53:48PM -0800, Alfred Perlstein wrote: A> The 
>>>> problem again is that not all the KASSERTS are inviolable, if you A> want 
>>>> to do a project to split them, then please do, it would really be A> 
>>>> helpful, as for now, they are a mis-mash of death/warnings and there are 
>>>> A> at least three vendors who approve of this as well as 3 long term A> 
>>>> committers that approved my change (not including Adrian).
>>>> 
>>>> Can you show examples of not inviolable KASSERTs?
>>> 
>>> There are none.  They are all assertions for a reason.  However, in my

Not even one whose existence is a bug? :-)

>>> experience at several large consumers of FreeBSD, no one wants to run with 
>>> INVARIANTS in production.  Not because we don't want panics (believe me, 
>>> Yahoo! gets plenty of panics even with INVARIANTS disabled), but because 
>>> the performance overhead, and redefining INVARIANTS into printf doesn't 
>>> address that at all.

I mostly don't use INVARIANTS, but once used Debugger() a lot.  Now I
only want to debug my own code and sometimes turn on INVARIANTS to try
to do this, but it usually exposes more bugs not in my own code so I
turn it off again :-).

>> In the past, FYI, the two major INVARIANTS hits were un-inlining of 
>> locking, and UMA using additional global locking to perform heavier-weight 
>> sanity checking.  For me, at least, the former wasn't such a problem, but 
>> the latter was extremely measurable.  I have occasionally wondered if we 
>> should have another option name for heavier-duty sanity checking, as is the 
>> case with WITNESS, SOCKBUF_DEBUG, etc, so that INVARIANTS overhead can be 
>> made tolerable for more real-world installations.  As you observe, our 
>> invariants tests are generally things where you catch critical problems in 
>> much more debuggable states, rather than sources of additional fragility, 
>> and it would be great if we could leave more in so that bug reports were 
>> able to contain more information.

I think that KASSERT() is used too much:
- the heavyweight uses that you mention
- many lightweight unconditional panics in old code were turned into
   conditional panics using KASSERT().  So production code that turns off
   INVARIANTS no longer gets the benefit of the sanity checking for these.
- newer code tends to use KASSERT() even for things that could reasonably
   be lightweight unconditional panics.  So production code that turns off
   INVARIANTS never got the benefit of the sanity checking for these.
- not many unconditional panics (with non-lightweight checking for them)
   can't be left in production code, so there were a limited number of them,
   with a few pushed under DIAGNOSTIC.  In particular, ones for debugging
   didn't grow unboundedly.  Now, KASSERT()s grow unboundedly and the
   cost of enabling INVARIANTS grows unboundedly.
- the source code becomes unreadable, with half of it tending to consist
   ot misformatted KASSERT()s SHOUTING IN UPPER CASE.  The latter bug is
   missing in assert(3).  assert(3) has a better designed API in other
   respects too.  For example it supplies the bloat for __func__, __FILE__
   and __LINE__ whether you want this or not.  This bloat was intentionally
   left out for KASSERT().  But many uses of KASSERT() supply this bloat
   in the caller, where it also bloat the source code.

> The KTR system offers an interesting reference for a model that allows us to 
> make a compile time decision about which areas to trace.
>
> There used to be a DIAGNOSTIC option for the more heavy checks, this is 
> either removed or just not used these days.  Again, using a mask like KTR 
> might be a win if we bring back the equivalent of heavy weight DIAGNOSTIC 
> option.

DIGANOSTIC still exists.  It is rarely used, and tends to be only used for
heavyweight checks that are too buggy or too heavyweight or just too old
to put in KASSERT().

> Often I've been guilty of putting KASSERT(ptr != NULL) checks into the code 
> too.  Those are really just to make it less painful when hitting that bug,

You mean to make it _more_ painful when hitting that bug.

> so 
> instead of having to do a lot of homework to figure out the fault address and 
> line number, I can just get a pretty printed message under INVARIANTS.  Maybe 
> those "pretty null checks" need to go under another option, perhaps 
> DIAGNOSTIC, or another .. ??PRETTY_PANIC.

Null pointer panics give restartable page faults, with the fault address
printed by trap_fatal() and the page fault often easy to restart using a
debugger.  trap_fatal() calls the debugger, so the page fault isn't
completely transparent.  This corresponds to gdb in userland stopping
on the signal for the page fault.  You might have to use "handle SIGFOO
Stop/Print/Pass" to get back to the faulting instruction.  In the kernel,
continuing from trap_fatal() gives this behaviour.  The page fault will
repeat endlessly unless you fix it up.  You can try various fixes until
you find one that doesn't fault, or let it fault a few times to look
at the context in different ways.

If panic() is used for null pointers, then you get an unrestartable
panic(), with the relevant context several frames back where it is
hard to see and much more difficult to restart from (practically
impossible now that panic() stops CPUs, etc., though it used to be
possible to unwind the frame in some cases).

The null pointer check doesn't even detect the common case where the
pointer is &p->bar where p is null.  Now the bad pointer is a small
offset from a null pointer (normally at a small address).  It would
take an earlier check of p to detect this.

> Anyhow, I've always been a huge fan of FreeBSD due the additional debugging 
> checks it offered.  I was the one that suggested splassert() back in the day 
> when we would continually find issues with spl calls.  Additionally I found

SPLASSERT() was almost never used.  The only uses that I can find of it
are all in netipsec.

> doing filesystem work a ton easier with DEBUG_VFS_LOCKS under FreeBSD than 
> under Darwin which at the time did not have such a feature.

This was more useful I think :-).  The main reason that SPLASSERT()
was useless was because spl was a stable API that went away at about
the same time that SPLASSERT() was added (SPLASSERT() was implemented
about 6 months before SMPng killed spl).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121215161414.V1029>