From owner-svn-src-head@freebsd.org Tue Apr 24 17:40:07 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 519CAFAD0BF; Tue, 24 Apr 2018 17:40:07 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-io0-x231.google.com (mail-io0-x231.google.com [IPv6:2607:f8b0:4001:c06::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D41137F4AB; Tue, 24 Apr 2018 17:40:06 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-io0-x231.google.com with SMTP id t123-v6so23688540iof.7; Tue, 24 Apr 2018 10:40:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=kz0qiVVI/RL4n1eO1doYqsiwE5Y4ja9IzI9Xdp4/Pks=; b=ZqvOwhjenhMYSlkFWw+vFxwzwZUX6TL9jQEDcTkOdPXYJlIaeSIw8Km5PmFwKySDJG m6jrc+u7kukCCV9hOmVXSMm92KJNE5EjsdhDnmqzu8y2DmMLHpA7VWb2UMF5ToKddLjm ToX0qYswh7jCB9zoUG9+mZJk6tzQKbtFBt3u64ohvNBxGqnLNgRYJ+8/Bj8WFn/i0jA0 xkQ/XuzBfaBhn5DrnK/Ik4mdG80GAF3LsEzYqdaCF6opEPkJb1plPzCq1AxK/EH4eBba z+BrMiU1KzY1q8JQfsdSMUJaLT40y3NDmdBJ56lwDUrKxK+h8vNvXvdzDXV7aVQIv1Cr xkQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=kz0qiVVI/RL4n1eO1doYqsiwE5Y4ja9IzI9Xdp4/Pks=; b=mGZdjLuAzDR1TK/IGgz0TAo9UDdMZXQJIiufAgE9dORRACdcxbtYLx9ySW0I8rWWm2 VAopVg3XqPFYh29No84C4ses4ocbR0+L7n7En8yfXrk+h0ceLCRIU7eXODjzA8Cl+hDG ffwI919vxNz4344e6pePyZYjGHS5TON3AsKiRVIXt0vSEdRceG2mlYHuuwcAoShqwSCN LfWHfeSX92n8WeL+FP30gF+/5gXD88Yb3oBeuH5HXlHChJK6YJdDwsFNO2n7c4L2hbva m/o8bEzkYmg5BnUFqbAkwMe7X42mzX02tRQfE+0RN181V2MOoxdUtl2Q3LaRkDask4r1 e4Cg== X-Gm-Message-State: ALQs6tCq6+Xa78m5F3wnZkOg0xARfPC/EKTba1nYNaJa5zFBS3/0WuNG gUJJ4Dgxm/b+quBvkSWGe4ZvKw== X-Google-Smtp-Source: AB8JxZoN8gcUVtRxTMSYN9TzzUX3Nai+F6BSzA75CC+MliKYHdbNTbc5yoODfzpG2wq9IItCs11dTw== X-Received: by 2002:a6b:ae49:: with SMTP id x70-v6mr16179920ioe.148.1524591605839; Tue, 24 Apr 2018 10:40:05 -0700 (PDT) Received: from raichu (toroon0560w-lp130-04-184-145-252-74.dsl.bell.ca. [184.145.252.74]) by smtp.gmail.com with ESMTPSA id z88-v6sm3844811ioi.25.2018.04.24.10.40.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Apr 2018 10:40:05 -0700 (PDT) Sender: Mark Johnston Date: Tue, 24 Apr 2018 13:40:02 -0400 From: Mark Johnston To: "Jonathan T. Looney" Cc: John Baldwin , cem@freebsd.org, src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r332860 - head/sys/kern Message-ID: <20180424174002.GB27358@raichu> References: <201804211705.w3LH50Dk056339@repo.freebsd.org> <20180423180024.GC84833@raichu> <1739228.8pyHcvzasL@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Apr 2018 17:40:07 -0000 On Tue, Apr 24, 2018 at 01:24:30PM -0400, Jonathan T. Looney wrote: > On Mon, Apr 23, 2018 at 6:04 PM, John Baldwin wrote: > > > > I think this is actually a key question. In my experience to date I have > not > > encountered a large number of post-panic assertion failures. Given that > > we already break all locks and disable assertions for locks I'd be curious > > which assertions are actually failing. My inclination given my > experiences > > to date would be to explicitly ignore those as we do for locking if it is > > constrained set rather than blacklisting all of them. However, I would be > > most interested in seeing some examples of assertions that are failing. > > The latest example (the one that prompted me to finally commit this) is in > lockmgr_sunlock_try(): 'panic: Assertion (*xp & ~LK_EXCLUSIVE_SPINNERS) == > LK_SHARERS_LOCK(1) failed at /usr/src/sys/kern/kern_lock.c:541' > > I don't see any obvious recent changes that would have caused this, so this > is probably a case where a change to another file suddenly made us trip > over this assert. > > And, that really illustrates my overall point: Mine too. :) Why is anything trying to acquire a lockmgr lock after a panic? What is the stack? I suspect that CAM is completing non-dump CCBs after a panic, which can cause deadlocks if the completion handler needs to perform a TLB shootdown after destroying a mapping, for example. In fact, I had forgotten that Isilon has some CAM patches which attempt to address this because of the problems that such deadlocks had caused. I will work on getting these reviewed and upstreamed. > most assertions in > general-use code have limited value after a panic. > > We expect developers to write high-quality assertions so we can catch bugs. > This requires that they understand how their code will be used. However, > once we've panic'd, many assumptions behind code change and the assertions > are no longer valid. (And, sometimes, it is difficult for a developer to > predict how these things will change in a panic situation.) We can either > play whack-a-mole to modify assertions as we trip over them in our > post-panic work, or we can switch to an opt-in model where we only check > assertions which the developer actually intends to run post-panic. > > Playing whack-a-mole seems like a Sisyphean task which will burn out > developers and/or frustrate people who run INVARIANTS kernels. Switching to > an opt-in model seems like the better long-term strategy. > > Having said all of that, I am cognizant of at least two things: > 1) Mark Johnston has done a lot of work in coredumps and thinks there are > post-panic assertions that have value. > 2) Until we have both agreement to switch our post-panic assertion paradigm > and also infrastructure to allow developers to opt in, it probably is not > wise to disable all assertions by default. > > So, I will follow Mark's suggestions: I will change the default. I will > also change the code so we print a limited number of failed assertions. Thanks. > However, I think that changing the post-panic assertion paradigm is an > important conversation to have. We want people to run our INVARIANTS > kernels. And, we want to get high-quality reports from those. I think we > could better serve those goals by changing the post-panic assertion > paradigm. > > Jonathan