From owner-svn-src-all@FreeBSD.ORG Thu Dec 13 20:17:34 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DC081EC9; Thu, 13 Dec 2012 20:17:34 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id BAA258FC12; Thu, 13 Dec 2012 20:17:33 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id j13so2146630lah.13 for ; Thu, 13 Dec 2012 12:17:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=iJwjKdiqmyEuRW1ZnoLgBapsq1AFCQV1RIlxsHUzrV0=; b=sok/XZC9xSoqlm7ETsxNBs6t/EBFubp1/ucFAMqrUPB3LJgU+TiolcoAECNgoRihgR S4LZ00NR3mJmp+9l0TPzGYo9u63KxMXi0CQTA+mhAyfACZvSfozTW1OcGcOa0TCb1AhA SA9UShX1UggwrN3TUE0rpktW6IzliZU4MAvrzl2Qj8CaA0XGVcTqBe9a5NabkvP3z2A6 /F0gAgOmoutRKuuZKaOom1K7xDuZjiNksf1rwu+b266F2JDbghXmmU5YdhvJMUH3IeU5 5ZnKPZTvoS4bXU0skjFUJaD30qAtWEA2YJorq+9S29WWI8Aj8UDt58wsMUg3gUasZN/k CMwQ== MIME-Version: 1.0 Received: by 10.112.46.66 with SMTP id t2mr1373631lbm.115.1355429851465; Thu, 13 Dec 2012 12:17:31 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.84.193 with HTTP; Thu, 13 Dec 2012 12:17:31 -0800 (PST) In-Reply-To: <50C9B525.2060503@FreeBSD.org> References: <201211251422.qAPEM8BV074656@svn.freebsd.org> <50C9B525.2060503@FreeBSD.org> Date: Thu, 13 Dec 2012 20:17:31 +0000 X-Google-Sender-Auth: L3QhmG8CWU7eVOLN0L-HJxIGJnI Message-ID: Subject: Re: svn commit: r243515 - head/sys/kern From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2012 20:17:35 -0000 On Thu, Dec 13, 2012 at 10:59 AM, Andriy Gapon wrote: > on 09/12/2012 19:27 Attilio Rao said the following: >> On Sun, Nov 25, 2012 at 2:22 PM, Andriy Gapon wrote: >>> Author: avg >>> Date: Sun Nov 25 14:22:08 2012 >>> New Revision: 243515 >>> URL: http://svnweb.freebsd.org/changeset/base/243515 >>> >>> Log: >>> remove stop_scheduler_on_panic knob >>> >>> There has not been any complaints about the default behavior, so there >>> is no need to keep a knob that enables the worse alternative. >>> >>> Now that the hard-stopping of other CPUs is the only behavior, the panic_cpu >>> spinlock-like logic can be dropped, because only a single CPU is >>> supposed to win stop_cpus_hard(other_cpus) race and proceed past that >>> call. >> >> While this is true for the sane case, for the case report by Ryan this >> still breaks. > > Yes. I haven't got around to start fixing the Ryan's problem yet. > But this commit should reduce number of places where changes have to be made. > In fact, I think that only stop_cpus_X would have to be fixed now. > >> Infact, immagine CPU0 (winner) and CPU1 (looser) both panic'ing. CPU0 >> wins and then sets stopping_cpu. When the deadlock happens in the >> spinning loop, because of generic_stop_cpus() logic CPU0 won't >> deadlock and will correctly continue, but the problem is that it sets >> back stopping_cpu to NOCPU, letting CPU1 continuing too and then >> deadlocking. >> >> At the minimum, what I think that should happen is to have the check >> in panic() as prior this change but with the add I outlined (thus we >> need to generalize cpustop_handler()). However, it seems to me that >> generic_stop_cpus() may still be broken by this and we eventually need >> to fix it. >> >> I would then revert this part of the patch and fix it appropriately. >> Later we can better discuss the generic_stop_cpus() similar race. > > I actually see this change and the Ryan's problem as orthogonal issues. > My opinion is let's just fix generic_stop_cpus(). Right, but as I said, for the time being we can at least have a correct panic() semantic and take the right time to fix the generic_stop_cpus() and then absorb also the panic() fix into it. Right now the mechanism is still broken in panic and it can be fixed with a very easy fix, so we should just do it. This will also help vendors like Sandvine which may have hit just this bug too. Attilio -- Peace can only be achieved by understanding - A. Einstein