From owner-svn-src-all@freebsd.org Mon Apr 23 15:18:26 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B6B37FA40DF; Mon, 23 Apr 2018 15:18:26 +0000 (UTC) (envelope-from jonlooney@gmail.com) Received: from mail-wr0-f180.google.com (mail-wr0-f180.google.com [209.85.128.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3221D75994; Mon, 23 Apr 2018 15:18:25 +0000 (UTC) (envelope-from jonlooney@gmail.com) Received: by mail-wr0-f180.google.com with SMTP id w3-v6so42356283wrg.2; Mon, 23 Apr 2018 08:18:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=jWmlCEJhd8ABkez4PY1knsndZr/lglqnSEhBO8nZbE0=; b=dI8AUs8X5mc6VrK4TqrwazAMqe6RS4em7p76sZtHpyUH3ZEMJTGB/g8lrquAmFmlcO H0fNMB3fLv4eCbeG1GaNxOZpko7Q8NwtsJr3h82J8f1utXkgIHlBp8Wsq4yyxz4tLNj4 r6KAGGVYxYwm5TP5d4Bukx9o7KSCgGAK0qzkuHZFb1siKdqxdKw2hRWyRck3/2C7LA9G RtTqJjCHjLRvmE8QJH/oQ1cmbR+XrRkuvIQbOlidDFySa7BoNXv+L7hkB/qIE2MYs6E5 ec1BkOMtXCQhYfjdgoZ6JOvhGqaw1fQerzhw1Bm95bVzfWXFkRkRyvrQ/8SjyqjFJjYW /qYA== X-Gm-Message-State: ALQs6tBi8dGYbrvLTNB6Mb7dpBZ1ftFxwkvc8GNKXVQBXiAU+8lbjR/3 qtE1XodmNL67sWSor9uL7gQWr62w X-Google-Smtp-Source: AIpwx49jqxToTJoh54/ElNv3aXPu6F3W213jIt547Sxfe592GS/4bqzt4+xmzCd3gwr7KrrfWUqDiQ== X-Received: by 10.80.137.123 with SMTP id f56mr2556888edf.206.1524496353962; Mon, 23 Apr 2018 08:12:33 -0700 (PDT) Received: from mail-wr0-f169.google.com (mail-wr0-f169.google.com. [209.85.128.169]) by smtp.gmail.com with ESMTPSA id v17sm7128578edl.47.2018.04.23.08.12.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Apr 2018 08:12:33 -0700 (PDT) Received: by mail-wr0-f169.google.com with SMTP id c14-v6so2199059wrd.4; Mon, 23 Apr 2018 08:12:33 -0700 (PDT) X-Received: by 2002:adf:884c:: with SMTP id e12-v6mr18367098wre.30.1524496352914; Mon, 23 Apr 2018 08:12:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.199.203 with HTTP; Mon, 23 Apr 2018 08:12:32 -0700 (PDT) In-Reply-To: <20180422171106.GB84833@raichu> References: <201804211705.w3LH50Dk056339@repo.freebsd.org> <20180422171106.GB84833@raichu> From: "Jonathan T. Looney" Date: Mon, 23 Apr 2018 11:12:32 -0400 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: svn commit: r332860 - head/sys/kern To: Mark Johnston Cc: cem@freebsd.org, src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Apr 2018 15:18:27 -0000 Hi Mark, Let me start by saying that I appreciate your well-reasoned response. (I think) I understand your reasoning, I appreciate your well-explained argument, and I respect your opinion. I just wanted to make that clear up front. On Sun, Apr 22, 2018 at 1:11 PM, Mark Johnston wrote: > > > All too often, my ability to debug assertion violations is hindered because > > the system trips over yet another assertion while dumping the core. If we > > skip the assertion, nothing bad happens. (The post-panic debugging code > > already needs to deal with systems that are inconsistent, and it does a > > pretty good job at it.) > > I think we make a decent effort to fix such problems as they arise, but > you are declaring defeat on behalf of everyone. Did you make some effort > to fix or report these issues before resorting to the more drastic > measure taken here? We try to report or fix them as they arise. However, you don't know there is a problem until you actually run into it. And, you don't run into the problem until you can't get a core dump due to the assertion. (And, with elusive problems, it isn't always easy to duplicate them. So, fixing the assertion is sometimes "too late".) > > On the other hand, I really am not sure what you are worried might happen > > if we skip checking assertions after we've already panic'd. As far as I can > > tell, the likely worst case is that we hit a true panic of some kind. In > > that case, we're no worse off than before. > > > > I think the one obvious exception is when we're purposely trying to > > validate the post-panic debugging code. In that case, you can change the > > sysctl/tunable to enable troubleshooting. > > What about a user whose test system panics and fails to dump? With > assertions enabled, a developer has a better chance of spotting the > problem. Now we need at least one extra round trip to the user to > diagnose the problem, which may not be readily reproducible in the first > place. That's true. However, this is equally true in the other direction: Prior to this change, when a user tripped over an assertion and was unable to get a coredump due to a second post-panic assertion, it took (at least) another round-trip to get a coredump. First, without the capability to ignore assertions after a panic (introduced by this commit), you would need to fix the actual assertion to enable the user to get a coredump. At minimum, I think this change has value in that case. This change gives you a mechanism to get a coredump without requiring that you fix the assertion and get the user to recompile with the patch. But, moreover, if we change the default back to panic'ing on a second assertion, we will hamper our ability to get usable reports about elusive bugs. If we leave the default "as is", it won't take an extra round-trip to tell the user how to get a coredump. If we change the default (or, perhaps more correctly, "restore the prior default"), we will still need a second round-trip to get coredumps. That makes it tough to chase elusive bugs. > > I would honestly appreciate someone explaining the dangers in disabling a > > response to assertion violations after we've already panic'd and are simply > > trying to troubleshoot, because they are not obvious to me. But, I could > > simply be missing them. > > The assertions help identify code that is being executed during a dump > when it shouldn't be. In general we rely on users to opt in to running > INVARIANTS kernels because developers don't catch every single bug. With > this change it's harder to be confident in the kernel dump code. (Or in > any post-panic debugging code for that matter.) I can appreciate that. I am generally skeptical of the value of assertions in general-use code after a panic, since we already know the system is in an inconsistent/unexpected state. And, it is hard to predict all the various ways it could possibly be broken. However, I do recognize that there is code which specifically is written to run post-panic, and which has assertions which SHOULD be true, even after a panic. > I dislike the change and would prefer the default to be inverted. At the > very least I think we should print the assertion message rather than > returning silently from kassert_panic(). I still think this change has value (as described above). I can understand the argument for changing the default. In fact, after thinking about your email, I'm leaning towards doing that. But, I want to ponder it more today. If we leave the default alone, I agree we should print the assertion message (albeit with some rate limit). Jonathan