From owner-svn-src-all@freebsd.org  Mon Apr 23 22:16:59 2018
Return-Path: <owner-svn-src-all@freebsd.org>
Delivered-To: svn-src-all@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52E1AFAC972;
 Mon, 23 Apr 2018 22:16:59 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from mail.baldwin.cx (bigwig.baldwin.cx [96.47.65.170])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8586A761A9;
 Mon, 23 Apr 2018 22:16:58 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (ralph.baldwin.cx [66.234.199.215])
 by mail.baldwin.cx (Postfix) with ESMTPSA id DFEA010AFD2;
 Mon, 23 Apr 2018 18:16:51 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Mark Johnston <markj@freebsd.org>
Cc: "Jonathan T. Looney" <jtl@freebsd.org>, cem@freebsd.org,
 src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,
 svn-src-head@freebsd.org
Subject: Re: svn commit: r332860 - head/sys/kern
Date: Mon, 23 Apr 2018 15:04:31 -0700
Message-ID: <1739228.8pyHcvzasL@ralph.baldwin.cx>
User-Agent: KMail/4.14.10 (FreeBSD/11.1-STABLE; KDE/4.14.30; amd64; ; )
In-Reply-To: <20180423180024.GC84833@raichu>
References: <201804211705.w3LH50Dk056339@repo.freebsd.org>
 <CADrOrmvAxuoadBM==1EEbJc4PAPwtd-vPE4Tg-pM86CvwQnnwA@mail.gmail.com>
 <20180423180024.GC84833@raichu>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3
 (mail.baldwin.cx); Mon, 23 Apr 2018 18:16:52 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx
X-Virus-Status: Clean
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Apr 2018 22:16:59 -0000

On Monday, April 23, 2018 02:00:24 PM Mark Johnston wrote:
> On Mon, Apr 23, 2018 at 11:12:32AM -0400, Jonathan T. Looney wrote:
> > Hi Mark,
> > 
> > Let me start by saying that I appreciate your well-reasoned response. (I
> > think) I understand your reasoning, I appreciate your well-explained
> > argument, and I respect your opinion. I just wanted to make that clear up
> > front.
> > 
> > On Sun, Apr 22, 2018 at 1:11 PM, Mark Johnston <markj@freebsd.org> wrote:
> > >
> > > > All too often, my ability to debug assertion violations is hindered
> > because
> > > > the system trips over yet another assertion while dumping the core. If
> > we
> > > > skip the assertion, nothing bad happens. (The post-panic debugging code
> > > > already needs to deal with systems that are inconsistent, and it does a
> > > > pretty good job at it.)
> > >
> > > I think we make a decent effort to fix such problems as they arise, but
> > > you are declaring defeat on behalf of everyone. Did you make some effort
> > > to fix or report these issues before resorting to the more drastic
> > > measure taken here?
> > 
> > We try to report or fix them as they arise. However, you don't know there
> > is a problem until you actually run into it. And, you don't run into the
> > problem until you can't get a core dump due to the assertion.
> > 
> > (And, with elusive problems, it isn't always easy to duplicate them. So,
> > fixing the assertion is sometimes "too late".)
> 
> Sure, this is true. But unless it's a problem in practice it's obviously
> preferable to keep assertions enabled. Kernel dumping itself is a
> fundamentally unreliable mechanism, but it works well enough to be
> useful. I basically never see problems with post-panic assertion
> failures, and I test the kernel dump code a fair bit. Isilon exercises
> that code quite a lot as well without any problems that I'm aware of,
> and I can't think of any reports of such assertion failures that weren't
> quickly fixed. So I'm wondering what problems exist in your specific
> environment that we might instead address surgically.
> 
> (I could very well be wrong about how widespread post-panic assertion
> failures are. We've had problems of this sort before, e.g., with the
> updated DRM graphics drivers, where the code to grab the console after a
> panic didn't work properly. There, the bandaid was to just disable that
> specific mechanism.)

I think this is actually a key question.  In my experience to date I have not
encountered a large number of post-panic assertion failures.  Given that
we already break all locks and disable assertions for locks I'd be curious
which assertions are actually failing.  My inclination given my experiences
to date would be to explicitly ignore those as we do for locking if it is
constrained set rather than blacklisting all of them.  However, I would be
most interested in seeing some examples of assertions that are failing.

-- 
John Baldwin