From owner-svn-src-head@FreeBSD.ORG Thu Dec 13 10:59:52 2012 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 89FABAF9; Thu, 13 Dec 2012 10:59:52 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 3D68D8FC0A; Thu, 13 Dec 2012 10:59:50 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA14153; Thu, 13 Dec 2012 12:59:49 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <50C9B525.2060503@FreeBSD.org> Date: Thu, 13 Dec 2012 12:59:49 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: attilio@FreeBSD.org Subject: Re: svn commit: r243515 - head/sys/kern References: <201211251422.qAPEM8BV074656@svn.freebsd.org> In-Reply-To: X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2012 10:59:52 -0000 on 09/12/2012 19:27 Attilio Rao said the following: > On Sun, Nov 25, 2012 at 2:22 PM, Andriy Gapon wrote: >> Author: avg >> Date: Sun Nov 25 14:22:08 2012 >> New Revision: 243515 >> URL: http://svnweb.freebsd.org/changeset/base/243515 >> >> Log: >> remove stop_scheduler_on_panic knob >> >> There has not been any complaints about the default behavior, so there >> is no need to keep a knob that enables the worse alternative. >> >> Now that the hard-stopping of other CPUs is the only behavior, the panic_cpu >> spinlock-like logic can be dropped, because only a single CPU is >> supposed to win stop_cpus_hard(other_cpus) race and proceed past that >> call. > > While this is true for the sane case, for the case report by Ryan this > still breaks. Yes. I haven't got around to start fixing the Ryan's problem yet. But this commit should reduce number of places where changes have to be made. In fact, I think that only stop_cpus_X would have to be fixed now. > Infact, immagine CPU0 (winner) and CPU1 (looser) both panic'ing. CPU0 > wins and then sets stopping_cpu. When the deadlock happens in the > spinning loop, because of generic_stop_cpus() logic CPU0 won't > deadlock and will correctly continue, but the problem is that it sets > back stopping_cpu to NOCPU, letting CPU1 continuing too and then > deadlocking. > > At the minimum, what I think that should happen is to have the check > in panic() as prior this change but with the add I outlined (thus we > need to generalize cpustop_handler()). However, it seems to me that > generic_stop_cpus() may still be broken by this and we eventually need > to fix it. > > I would then revert this part of the patch and fix it appropriately. > Later we can better discuss the generic_stop_cpus() similar race. I actually see this change and the Ryan's problem as orthogonal issues. My opinion is let's just fix generic_stop_cpus(). -- Andriy Gapon