From owner-svn-src-all@freebsd.org Mon Apr 23 22:16:59 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52E1AFAC972; Mon, 23 Apr 2018 22:16:59 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mail.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8586A761A9; Mon, 23 Apr 2018 22:16:58 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (ralph.baldwin.cx [66.234.199.215]) by mail.baldwin.cx (Postfix) with ESMTPSA id DFEA010AFD2; Mon, 23 Apr 2018 18:16:51 -0400 (EDT) From: John Baldwin To: Mark Johnston Cc: "Jonathan T. Looney" , cem@freebsd.org, src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r332860 - head/sys/kern Date: Mon, 23 Apr 2018 15:04:31 -0700 Message-ID: <1739228.8pyHcvzasL@ralph.baldwin.cx> User-Agent: KMail/4.14.10 (FreeBSD/11.1-STABLE; KDE/4.14.30; amd64; ; ) In-Reply-To: <20180423180024.GC84833@raichu> References: <201804211705.w3LH50Dk056339@repo.freebsd.org> <20180423180024.GC84833@raichu> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.baldwin.cx); Mon, 23 Apr 2018 18:16:52 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx X-Virus-Status: Clean X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Apr 2018 22:16:59 -0000 On Monday, April 23, 2018 02:00:24 PM Mark Johnston wrote: > On Mon, Apr 23, 2018 at 11:12:32AM -0400, Jonathan T. Looney wrote: > > Hi Mark, > > > > Let me start by saying that I appreciate your well-reasoned response. (I > > think) I understand your reasoning, I appreciate your well-explained > > argument, and I respect your opinion. I just wanted to make that clear up > > front. > > > > On Sun, Apr 22, 2018 at 1:11 PM, Mark Johnston wrote: > > > > > > > All too often, my ability to debug assertion violations is hindered > > because > > > > the system trips over yet another assertion while dumping the core. If > > we > > > > skip the assertion, nothing bad happens. (The post-panic debugging code > > > > already needs to deal with systems that are inconsistent, and it does a > > > > pretty good job at it.) > > > > > > I think we make a decent effort to fix such problems as they arise, but > > > you are declaring defeat on behalf of everyone. Did you make some effort > > > to fix or report these issues before resorting to the more drastic > > > measure taken here? > > > > We try to report or fix them as they arise. However, you don't know there > > is a problem until you actually run into it. And, you don't run into the > > problem until you can't get a core dump due to the assertion. > > > > (And, with elusive problems, it isn't always easy to duplicate them. So, > > fixing the assertion is sometimes "too late".) > > Sure, this is true. But unless it's a problem in practice it's obviously > preferable to keep assertions enabled. Kernel dumping itself is a > fundamentally unreliable mechanism, but it works well enough to be > useful. I basically never see problems with post-panic assertion > failures, and I test the kernel dump code a fair bit. Isilon exercises > that code quite a lot as well without any problems that I'm aware of, > and I can't think of any reports of such assertion failures that weren't > quickly fixed. So I'm wondering what problems exist in your specific > environment that we might instead address surgically. > > (I could very well be wrong about how widespread post-panic assertion > failures are. We've had problems of this sort before, e.g., with the > updated DRM graphics drivers, where the code to grab the console after a > panic didn't work properly. There, the bandaid was to just disable that > specific mechanism.) I think this is actually a key question. In my experience to date I have not encountered a large number of post-panic assertion failures. Given that we already break all locks and disable assertions for locks I'd be curious which assertions are actually failing. My inclination given my experiences to date would be to explicitly ignore those as we do for locking if it is constrained set rather than blacklisting all of them. However, I would be most interested in seeing some examples of assertions that are failing. -- John Baldwin