From owner-svn-src-all@freebsd.org Mon Apr 23 18:00:29 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95FE8FA811B; Mon, 23 Apr 2018 18:00:29 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-it0-x232.google.com (mail-it0-x232.google.com [IPv6:2607:f8b0:4001:c0b::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27E4379ED7; Mon, 23 Apr 2018 18:00:29 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-it0-x232.google.com with SMTP id x144-v6so10077514itc.0; Mon, 23 Apr 2018 11:00:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=zZT8MgWChhmxDSxmLi/sM6NaY155dT3q38e0D9fLA/U=; b=OUjHIa9SwVIgmWvGgpKwddpr7lfsSe93IyZ2CIoMjlEeXscVwgPF0nLpCfAWRswDKn +f4yEZuRUXDlPcjLpzyRXwKyLCDyul0p5hedZ8UovKF9pzql7nYWvFLoECZ3LV2tQeq4 /ju1PyP12vHSXbb6qQh3h1SvwQM4QfSAyIeFmGPGtdXfCjntuseAepfoBM75pm7ywthR WNphPBB5n2t8HdqWwuZ+F8nsN/7qHF3BrREV+yOOilBV2Jvyx5g4OSadG7AzGeqLZv2O q5Y6hDcOuS7219j89Nsb6mJT0uIMJ3bQ25ShTZqh8ShN/sCpWkxo0+P844xFQGvKx4x7 o0iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=zZT8MgWChhmxDSxmLi/sM6NaY155dT3q38e0D9fLA/U=; b=HggF7touDbGCied51heTMhA2HyFbslIs/rFp75yDZ4ndCusyB/avfO2EvLQlOUwi76 ccMwDrjn+qG13APsCz4/U24lBbc7jKCSF9Ng+40qekjEJObNNEHnNtUlQT0iOjk0MMqV +tdsd1kP5nWnjX9/EXsf1f2EBVDN9/ba4zQ8rPc9mYs3nQhpq4ywkdO29xpXvMad+fCh L2dqLia39cBiXaqNcGcvtgQpxbDVOMX5kLNXc+EaOxeesRmPJrL9OXQPD/aq6zQLerie c8/j+8D3KQJ/fLFkyxnAw2bH3NoFZGV9FN7E3YSNRupeAy0e9MmfLWaAPZQbHY0JD+y0 VsQg== X-Gm-Message-State: ALQs6tAusQFYBZLK23qPg7qc4iw/G3m7ix1yT2nj2OwtXT/hz8a+YywT 6mhppZw559X9BFD97w1Dxetocw== X-Google-Smtp-Source: AB8JxZqig/hapimEzuS3YBZbzUotArJt2IkGgbr0WvRjO7wU49W70n+9WbOzifXWz3F+FQdQA/t53Q== X-Received: by 2002:a24:740a:: with SMTP id o10-v6mr15613878itc.82.1524506428051; Mon, 23 Apr 2018 11:00:28 -0700 (PDT) Received: from raichu (toroon0560w-lp130-04-184-145-252-74.dsl.bell.ca. [184.145.252.74]) by smtp.gmail.com with ESMTPSA id z88-v6sm2650595ioi.25.2018.04.23.11.00.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Apr 2018 11:00:26 -0700 (PDT) Sender: Mark Johnston Date: Mon, 23 Apr 2018 14:00:24 -0400 From: Mark Johnston To: "Jonathan T. Looney" Cc: cem@freebsd.org, src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r332860 - head/sys/kern Message-ID: <20180423180024.GC84833@raichu> References: <201804211705.w3LH50Dk056339@repo.freebsd.org> <20180422171106.GB84833@raichu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Apr 2018 18:00:29 -0000 On Mon, Apr 23, 2018 at 11:12:32AM -0400, Jonathan T. Looney wrote: > Hi Mark, > > Let me start by saying that I appreciate your well-reasoned response. (I > think) I understand your reasoning, I appreciate your well-explained > argument, and I respect your opinion. I just wanted to make that clear up > front. > > On Sun, Apr 22, 2018 at 1:11 PM, Mark Johnston wrote: > > > > > All too often, my ability to debug assertion violations is hindered > because > > > the system trips over yet another assertion while dumping the core. If > we > > > skip the assertion, nothing bad happens. (The post-panic debugging code > > > already needs to deal with systems that are inconsistent, and it does a > > > pretty good job at it.) > > > > I think we make a decent effort to fix such problems as they arise, but > > you are declaring defeat on behalf of everyone. Did you make some effort > > to fix or report these issues before resorting to the more drastic > > measure taken here? > > We try to report or fix them as they arise. However, you don't know there > is a problem until you actually run into it. And, you don't run into the > problem until you can't get a core dump due to the assertion. > > (And, with elusive problems, it isn't always easy to duplicate them. So, > fixing the assertion is sometimes "too late".) Sure, this is true. But unless it's a problem in practice it's obviously preferable to keep assertions enabled. Kernel dumping itself is a fundamentally unreliable mechanism, but it works well enough to be useful. I basically never see problems with post-panic assertion failures, and I test the kernel dump code a fair bit. Isilon exercises that code quite a lot as well without any problems that I'm aware of, and I can't think of any reports of such assertion failures that weren't quickly fixed. So I'm wondering what problems exist in your specific environment that we might instead address surgically. (I could very well be wrong about how widespread post-panic assertion failures are. We've had problems of this sort before, e.g., with the updated DRM graphics drivers, where the code to grab the console after a panic didn't work properly. There, the bandaid was to just disable that specific mechanism.) > > > On the other hand, I really am not sure what you are worried might > happen > > > if we skip checking assertions after we've already panic'd. As far as I > can > > > tell, the likely worst case is that we hit a true panic of some kind. In > > > that case, we're no worse off than before. > > > > > > I think the one obvious exception is when we're purposely trying to > > > validate the post-panic debugging code. In that case, you can change the > > > sysctl/tunable to enable troubleshooting. > > > > What about a user whose test system panics and fails to dump? With > > assertions enabled, a developer has a better chance of spotting the > > problem. Now we need at least one extra round trip to the user to > > diagnose the problem, which may not be readily reproducible in the first > > place. > > That's true. However, this is equally true in the other direction: Prior to > this change, when a user tripped over an assertion and was unable to get a > coredump due to a second post-panic assertion, it took (at least) another > round-trip to get a coredump. > > First, without the capability to ignore assertions after a panic > (introduced by this commit), you would need to fix the actual assertion to > enable the user to get a coredump. At minimum, I think this change has > value in that case. This change gives you a mechanism to get a coredump > without requiring that you fix the assertion and get the user to recompile > with the patch. > > But, moreover, if we change the default back to panic'ing on a second > assertion, we will hamper our ability to get usable reports about elusive > bugs. If we leave the default "as is", it won't take an extra round-trip to > tell the user how to get a coredump. If we change the default (or, perhaps > more correctly, "restore the prior default"), we will still need a second > round-trip to get coredumps. That makes it tough to chase elusive bugs. I agree with what you're saying. I'm thinking more of the long-term effects of this change and am concerned that it increases the potential for actual bugs to appear in the kernel dump code paths. Those types of bugs are often quite tricky to track down from a single instance, and can cause dumps to fail. If that starts to happen, we're basically back to where we started.