Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Nov 2012 14:01:16 +0000
From:      Attilio Rao <attilio@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, Ryan Stone <rysto32@gmail.com>
Subject:   Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
Message-ID:  <CAJ-FndDVt8VRA4kQipT5Lm%2Bo2KRRum9NKWorfeAucwR=hJ0uDw@mail.gmail.com>
In-Reply-To: <50B21545.5060807@FreeBSD.org>
References:  <CAFMmRNwb_rxYXHGtXgtcyVUJnFDx5PSeMmA_crBbeV_rtzL9Cg@mail.gmail.com> <50A5F12C.1050902@FreeBSD.org> <CAJ-FndAB%2B7KRAE91L9634eXgzqgrizwtwCBC7AAg%2B0EX89TEBQ@mail.gmail.com> <50A63D1D.9090500@FreeBSD.org> <CAJ-FndDC1QCytXDJqVkism_5VoLNo_OzZxNEQ9NHx63HC=GTNg@mail.gmail.com> <50A65208.4050804@FreeBSD.org> <CAJ-FndADxJtYPX2-cQnqJoLhzYtJMidG1DPPY%2B6Dtf4rVw_zrw@mail.gmail.com> <50B21545.5060807@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon <avg@freebsd.org> wrote:
> on 25/11/2012 14:29 Attilio Rao said the following:
>> I think the patch you propose makes such effects even worse, because
>> it disables interrupts in generic_stop_cpus().
>> What I suggest to do, is the following:
>> - The CPU which wins the race for generic_stop_cpus also signals the
>> CPUs it is willing to stop on a global mask
>> - Another CPU entering generic_stop_cpus() and loosing the race,
>> checks the mask of cpus which might be stopped and stops itself if
>> necessary (ie. not yet done). We must be careful with races here, but
>> I'm confindent this can be done easily enough.
>
> I think that you either misunderstood my patch or I misunderstand your
> suggestion, because my patch does exactly what you wrote above.

The patch is someway incomplete:
- I don't think that we need specific checks in cpustop_handler() (and
if you have added them to prevent races, I don't think they are
enough, see below)
- setting of "stopping_cpus" map must happen atomically/before the
stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL
mask in the check
- Did you consider the races about when a stop and restart request
happen just after the CPU_ISSET() check? I think CPUs can deadlock
there.
- I'm very doubious about the spinlock_enter() stuff, I think I can
just make the problem worse atm.

However you are right, the concept of your patch is the same I really
wanted to get, we maybe need to just lift it up a bit.

In the while I also double-checked suspended_cpus and I don't think
there are real showstoppers to have it in stopped_cpus map.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDVt8VRA4kQipT5Lm%2Bo2KRRum9NKWorfeAucwR=hJ0uDw>