From owner-svn-src-head@freebsd.org Tue Apr 24 17:24:34 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 238B1FAC9FB; Tue, 24 Apr 2018 17:24:34 +0000 (UTC) (envelope-from jonlooney@gmail.com) Received: from mail-wm0-f43.google.com (mail-wm0-f43.google.com [74.125.82.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8B0A57DF7B; Tue, 24 Apr 2018 17:24:33 +0000 (UTC) (envelope-from jonlooney@gmail.com) Received: by mail-wm0-f43.google.com with SMTP id i3so2157158wmf.3; Tue, 24 Apr 2018 10:24:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=sOsxpNhhMFAJcJ3lR2pV10lyaKBlZXQx+N0RpeSQ3Wc=; b=hFJ5yr0UPecRCfDunpLpKMrWjsomqI8j20CGcb/DN4PcR93Q5FRgVR9kwsAOagh+HS 1K43AZnbM0XQIMjocl1CgFEGliSSQObRt2MsY67WEWi/kwUOrFyoqZX4moSg1DOcWmCn GxoajMoV5cD3at8eeG+pkquC+3R1CkYkEhpMz8fACLWoGHXY2nCSBOfDJ/CnfkyblP3A ksILdHz/cgtxzASPXX+A8vI8GWr/vv2wGmENwauHS6YKZT5cKOiQBfHPnaj8vbOuKtxH SKkmzDJz5JHKbV2DutY5D0g2ctKDu5GGRIaDhhNXHBT8t7Rn/jkP74j8gWA1L6k5IT5E vS0g== X-Gm-Message-State: ALQs6tAPyWPl46XZ6mVJABJ2otA4MOGNyPBCtiCIkqtlcVSs/e41tKb4 YCufhh2wGU6KOIy1v+H/h7jrQK8kXrU= X-Google-Smtp-Source: AIpwx49qlLf4/QDCClIo/DpTOq54F0+hAgRKRuPqQSsq0SqNFtXI/S+8U2sQtj6r0tSd0lWVbdht0Q== X-Received: by 10.80.243.149 with SMTP id g21mr34011885edm.13.1524590672106; Tue, 24 Apr 2018 10:24:32 -0700 (PDT) Received: from mail-wr0-f172.google.com (mail-wr0-f172.google.com. [209.85.128.172]) by smtp.gmail.com with ESMTPSA id m20sm169242edq.46.2018.04.24.10.24.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Apr 2018 10:24:31 -0700 (PDT) Received: by mail-wr0-f172.google.com with SMTP id p18-v6so31552212wrm.1; Tue, 24 Apr 2018 10:24:31 -0700 (PDT) X-Received: by 2002:adf:8b44:: with SMTP id v4-v6mr19751462wra.99.1524590671260; Tue, 24 Apr 2018 10:24:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.199.203 with HTTP; Tue, 24 Apr 2018 10:24:30 -0700 (PDT) In-Reply-To: <1739228.8pyHcvzasL@ralph.baldwin.cx> References: <201804211705.w3LH50Dk056339@repo.freebsd.org> <20180423180024.GC84833@raichu> <1739228.8pyHcvzasL@ralph.baldwin.cx> From: "Jonathan T. Looney" Date: Tue, 24 Apr 2018 13:24:30 -0400 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: svn commit: r332860 - head/sys/kern To: John Baldwin Cc: Mark Johnston , cem@freebsd.org, src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Apr 2018 17:24:34 -0000 On Mon, Apr 23, 2018 at 6:04 PM, John Baldwin wrote: > > I think this is actually a key question. In my experience to date I have not > encountered a large number of post-panic assertion failures. Given that > we already break all locks and disable assertions for locks I'd be curious > which assertions are actually failing. My inclination given my experiences > to date would be to explicitly ignore those as we do for locking if it is > constrained set rather than blacklisting all of them. However, I would be > most interested in seeing some examples of assertions that are failing. The latest example (the one that prompted me to finally commit this) is in lockmgr_sunlock_try(): 'panic: Assertion (*xp & ~LK_EXCLUSIVE_SPINNERS) == LK_SHARERS_LOCK(1) failed at /usr/src/sys/kern/kern_lock.c:541' I don't see any obvious recent changes that would have caused this, so this is probably a case where a change to another file suddenly made us trip over this assert. And, that really illustrates my overall point: most assertions in general-use code have limited value after a panic. We expect developers to write high-quality assertions so we can catch bugs. This requires that they understand how their code will be used. However, once we've panic'd, many assumptions behind code change and the assertions are no longer valid. (And, sometimes, it is difficult for a developer to predict how these things will change in a panic situation.) We can either play whack-a-mole to modify assertions as we trip over them in our post-panic work, or we can switch to an opt-in model where we only check assertions which the developer actually intends to run post-panic. Playing whack-a-mole seems like a Sisyphean task which will burn out developers and/or frustrate people who run INVARIANTS kernels. Switching to an opt-in model seems like the better long-term strategy. Having said all of that, I am cognizant of at least two things: 1) Mark Johnston has done a lot of work in coredumps and thinks there are post-panic assertions that have value. 2) Until we have both agreement to switch our post-panic assertion paradigm and also infrastructure to allow developers to opt in, it probably is not wise to disable all assertions by default. So, I will follow Mark's suggestions: I will change the default. I will also change the code so we print a limited number of failed assertions. However, I think that changing the post-panic assertion paradigm is an important conversation to have. We want people to run our INVARIANTS kernels. And, we want to get high-quality reports from those. I think we could better serve those goals by changing the post-panic assertion paradigm. Jonathan