Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Apr 2018 13:24:30 -0400
From:      "Jonathan T. Looney" <jtl@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Mark Johnston <markj@freebsd.org>, cem@freebsd.org,  src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r332860 - head/sys/kern
Message-ID:  <CADrOrmunCxzSpBZe35XwX7ZSFyPuSEpvEraKmQP2MdSP8ZMEGw@mail.gmail.com>
In-Reply-To: <1739228.8pyHcvzasL@ralph.baldwin.cx>
References:  <201804211705.w3LH50Dk056339@repo.freebsd.org> <CADrOrmvAxuoadBM==1EEbJc4PAPwtd-vPE4Tg-pM86CvwQnnwA@mail.gmail.com> <20180423180024.GC84833@raichu> <1739228.8pyHcvzasL@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Apr 23, 2018 at 6:04 PM, John Baldwin <jhb@freebsd.org> wrote:
>
> I think this is actually a key question.  In my experience to date I have
not
> encountered a large number of post-panic assertion failures.  Given that
> we already break all locks and disable assertions for locks I'd be curious
> which assertions are actually failing.  My inclination given my
experiences
> to date would be to explicitly ignore those as we do for locking if it is
> constrained set rather than blacklisting all of them.  However, I would be
> most interested in seeing some examples of assertions that are failing.

The latest example (the one that prompted me to finally commit this) is in
lockmgr_sunlock_try(): 'panic: Assertion (*xp & ~LK_EXCLUSIVE_SPINNERS) ==
LK_SHARERS_LOCK(1) failed at /usr/src/sys/kern/kern_lock.c:541'

I don't see any obvious recent changes that would have caused this, so this
is probably a case where a change to another file suddenly made us trip
over this assert.

And, that really illustrates my overall point: most assertions in
general-use code have limited value after a panic.

We expect developers to write high-quality assertions so we can catch bugs.
This requires that they understand how their code will be used. However,
once we've panic'd, many assumptions behind code change and the assertions
are no longer valid. (And, sometimes, it is difficult for a developer to
predict how these things will change in a panic situation.) We can either
play whack-a-mole to modify assertions as we trip over them in our
post-panic work, or we can switch to an opt-in model where we only check
assertions which the developer actually intends to run post-panic.

Playing whack-a-mole seems like a Sisyphean task which will burn out
developers and/or frustrate people who run INVARIANTS kernels. Switching to
an opt-in model seems like the better long-term strategy.

Having said all of that, I am cognizant of at least two things:
1) Mark Johnston has done a lot of work in coredumps and thinks there are
post-panic assertions that have value.
2) Until we have both agreement to switch our post-panic assertion paradigm
and also infrastructure to allow developers to opt in, it probably is not
wise to disable all assertions by default.

So, I will follow Mark's suggestions: I will change the default. I will
also change the code so we print a limited number of failed assertions.

However, I think that changing the post-panic assertion paradigm is an
important conversation to have. We want people to run our INVARIANTS
kernels. And, we want to get high-quality reports from those. I think we
could better serve those goals by changing the post-panic assertion
paradigm.

Jonathan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADrOrmunCxzSpBZe35XwX7ZSFyPuSEpvEraKmQP2MdSP8ZMEGw>