Date: Thu, 13 Dec 2012 20:17:31 +0000 From: Attilio Rao <attilio@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r243515 - head/sys/kern Message-ID: <CAJ-FndAsjZSK9XGFhHvdcD5135Xo4acybPDtZ0J9jXeAMpH5%2BQ@mail.gmail.com> In-Reply-To: <50C9B525.2060503@FreeBSD.org> References: <201211251422.qAPEM8BV074656@svn.freebsd.org> <CAJ-FndCGe=DtqKxRe0YXV0GJrf4CV6MX9B1MR-Uyy6A3hpongA@mail.gmail.com> <50C9B525.2060503@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 13, 2012 at 10:59 AM, Andriy Gapon <avg@freebsd.org> wrote: > on 09/12/2012 19:27 Attilio Rao said the following: >> On Sun, Nov 25, 2012 at 2:22 PM, Andriy Gapon <avg@freebsd.org> wrote: >>> Author: avg >>> Date: Sun Nov 25 14:22:08 2012 >>> New Revision: 243515 >>> URL: http://svnweb.freebsd.org/changeset/base/243515 >>> >>> Log: >>> remove stop_scheduler_on_panic knob >>> >>> There has not been any complaints about the default behavior, so there >>> is no need to keep a knob that enables the worse alternative. >>> >>> Now that the hard-stopping of other CPUs is the only behavior, the panic_cpu >>> spinlock-like logic can be dropped, because only a single CPU is >>> supposed to win stop_cpus_hard(other_cpus) race and proceed past that >>> call. >> >> While this is true for the sane case, for the case report by Ryan this >> still breaks. > > Yes. I haven't got around to start fixing the Ryan's problem yet. > But this commit should reduce number of places where changes have to be made. > In fact, I think that only stop_cpus_X would have to be fixed now. > >> Infact, immagine CPU0 (winner) and CPU1 (looser) both panic'ing. CPU0 >> wins and then sets stopping_cpu. When the deadlock happens in the >> spinning loop, because of generic_stop_cpus() logic CPU0 won't >> deadlock and will correctly continue, but the problem is that it sets >> back stopping_cpu to NOCPU, letting CPU1 continuing too and then >> deadlocking. >> >> At the minimum, what I think that should happen is to have the check >> in panic() as prior this change but with the add I outlined (thus we >> need to generalize cpustop_handler()). However, it seems to me that >> generic_stop_cpus() may still be broken by this and we eventually need >> to fix it. >> >> I would then revert this part of the patch and fix it appropriately. >> Later we can better discuss the generic_stop_cpus() similar race. > > I actually see this change and the Ryan's problem as orthogonal issues. > My opinion is let's just fix generic_stop_cpus(). Right, but as I said, for the time being we can at least have a correct panic() semantic and take the right time to fix the generic_stop_cpus() and then absorb also the panic() fix into it. Right now the mechanism is still broken in panic and it can be fixed with a very easy fix, so we should just do it. This will also help vendors like Sandvine which may have hit just this bug too. Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndAsjZSK9XGFhHvdcD5135Xo4acybPDtZ0J9jXeAMpH5%2BQ>