Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Dec 2012 12:59:49 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        attilio@FreeBSD.org
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r243515 - head/sys/kern
Message-ID:  <50C9B525.2060503@FreeBSD.org>
In-Reply-To: <CAJ-FndCGe=DtqKxRe0YXV0GJrf4CV6MX9B1MR-Uyy6A3hpongA@mail.gmail.com>
References:  <201211251422.qAPEM8BV074656@svn.freebsd.org> <CAJ-FndCGe=DtqKxRe0YXV0GJrf4CV6MX9B1MR-Uyy6A3hpongA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
on 09/12/2012 19:27 Attilio Rao said the following:
> On Sun, Nov 25, 2012 at 2:22 PM, Andriy Gapon <avg@freebsd.org> wrote:
>> Author: avg
>> Date: Sun Nov 25 14:22:08 2012
>> New Revision: 243515
>> URL: http://svnweb.freebsd.org/changeset/base/243515
>>
>> Log:
>>   remove stop_scheduler_on_panic knob
>>
>>   There has not been any complaints about the default behavior, so there
>>   is no need to keep a knob that enables the worse alternative.
>>
>>   Now that the hard-stopping of other CPUs is the only behavior, the panic_cpu
>>   spinlock-like logic can be dropped, because only a single CPU is
>>   supposed to win stop_cpus_hard(other_cpus) race and proceed past that
>>   call.
> 
> While this is true for the sane case, for the case report by Ryan this
> still breaks.

Yes.  I haven't got around to start fixing the Ryan's problem yet.
But this commit should reduce number of places where changes have to be made.
In fact, I think that only stop_cpus_X would have to be fixed now.

> Infact, immagine CPU0 (winner) and CPU1 (looser) both panic'ing. CPU0
> wins and then sets stopping_cpu. When the deadlock happens in the
> spinning loop, because of generic_stop_cpus() logic CPU0 won't
> deadlock and will correctly continue, but the problem is that it sets
> back stopping_cpu to NOCPU, letting CPU1 continuing too and then
> deadlocking.
> 
> At the minimum, what I think that should happen is to have the check
> in panic() as prior this change but with the add I outlined (thus we
> need to generalize cpustop_handler()). However, it seems to me that
> generic_stop_cpus() may still be broken by this and we eventually need
> to fix it.
> 
> I would then revert this part of the patch and fix it appropriately.
> Later we can better discuss the generic_stop_cpus() similar race.

I actually see this change and the Ryan's problem as orthogonal issues.
My opinion is let's just fix generic_stop_cpus().

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50C9B525.2060503>