Date: Tue, 24 Apr 2018 13:40:02 -0400 From: Mark Johnston <markj@freebsd.org> To: "Jonathan T. Looney" <jtl@freebsd.org> Cc: John Baldwin <jhb@freebsd.org>, cem@freebsd.org, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r332860 - head/sys/kern Message-ID: <20180424174002.GB27358@raichu> In-Reply-To: <CADrOrmunCxzSpBZe35XwX7ZSFyPuSEpvEraKmQP2MdSP8ZMEGw@mail.gmail.com> References: <201804211705.w3LH50Dk056339@repo.freebsd.org> <CADrOrmvAxuoadBM==1EEbJc4PAPwtd-vPE4Tg-pM86CvwQnnwA@mail.gmail.com> <20180423180024.GC84833@raichu> <1739228.8pyHcvzasL@ralph.baldwin.cx> <CADrOrmunCxzSpBZe35XwX7ZSFyPuSEpvEraKmQP2MdSP8ZMEGw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 24, 2018 at 01:24:30PM -0400, Jonathan T. Looney wrote: > On Mon, Apr 23, 2018 at 6:04 PM, John Baldwin <jhb@freebsd.org> wrote: > > > > I think this is actually a key question. In my experience to date I have > not > > encountered a large number of post-panic assertion failures. Given that > > we already break all locks and disable assertions for locks I'd be curious > > which assertions are actually failing. My inclination given my > experiences > > to date would be to explicitly ignore those as we do for locking if it is > > constrained set rather than blacklisting all of them. However, I would be > > most interested in seeing some examples of assertions that are failing. > > The latest example (the one that prompted me to finally commit this) is in > lockmgr_sunlock_try(): 'panic: Assertion (*xp & ~LK_EXCLUSIVE_SPINNERS) == > LK_SHARERS_LOCK(1) failed at /usr/src/sys/kern/kern_lock.c:541' > > I don't see any obvious recent changes that would have caused this, so this > is probably a case where a change to another file suddenly made us trip > over this assert. > > And, that really illustrates my overall point: Mine too. :) Why is anything trying to acquire a lockmgr lock after a panic? What is the stack? I suspect that CAM is completing non-dump CCBs after a panic, which can cause deadlocks if the completion handler needs to perform a TLB shootdown after destroying a mapping, for example. In fact, I had forgotten that Isilon has some CAM patches which attempt to address this because of the problems that such deadlocks had caused. I will work on getting these reviewed and upstreamed. > most assertions in > general-use code have limited value after a panic. > > We expect developers to write high-quality assertions so we can catch bugs. > This requires that they understand how their code will be used. However, > once we've panic'd, many assumptions behind code change and the assertions > are no longer valid. (And, sometimes, it is difficult for a developer to > predict how these things will change in a panic situation.) We can either > play whack-a-mole to modify assertions as we trip over them in our > post-panic work, or we can switch to an opt-in model where we only check > assertions which the developer actually intends to run post-panic. > > Playing whack-a-mole seems like a Sisyphean task which will burn out > developers and/or frustrate people who run INVARIANTS kernels. Switching to > an opt-in model seems like the better long-term strategy. > > Having said all of that, I am cognizant of at least two things: > 1) Mark Johnston has done a lot of work in coredumps and thinks there are > post-panic assertions that have value. > 2) Until we have both agreement to switch our post-panic assertion paradigm > and also infrastructure to allow developers to opt in, it probably is not > wise to disable all assertions by default. > > So, I will follow Mark's suggestions: I will change the default. I will > also change the code so we print a limited number of failed assertions. Thanks. > However, I think that changing the post-panic assertion paradigm is an > important conversation to have. We want people to run our INVARIANTS > kernels. And, we want to get high-quality reports from those. I think we > could better serve those goals by changing the post-panic assertion > paradigm. > > Jonathan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180424174002.GB27358>