Date: Sun, 25 Nov 2012 14:01:16 +0000 From: Attilio Rao <attilio@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-hackers@freebsd.org, Ryan Stone <rysto32@gmail.com> Subject: Re: stop_cpus_hard when multiple CPUs are panicking from an NMI Message-ID: <CAJ-FndDVt8VRA4kQipT5Lm%2Bo2KRRum9NKWorfeAucwR=hJ0uDw@mail.gmail.com> In-Reply-To: <50B21545.5060807@FreeBSD.org> References: <CAFMmRNwb_rxYXHGtXgtcyVUJnFDx5PSeMmA_crBbeV_rtzL9Cg@mail.gmail.com> <50A5F12C.1050902@FreeBSD.org> <CAJ-FndAB%2B7KRAE91L9634eXgzqgrizwtwCBC7AAg%2B0EX89TEBQ@mail.gmail.com> <50A63D1D.9090500@FreeBSD.org> <CAJ-FndDC1QCytXDJqVkism_5VoLNo_OzZxNEQ9NHx63HC=GTNg@mail.gmail.com> <50A65208.4050804@FreeBSD.org> <CAJ-FndADxJtYPX2-cQnqJoLhzYtJMidG1DPPY%2B6Dtf4rVw_zrw@mail.gmail.com> <50B21545.5060807@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon <avg@freebsd.org> wrote: > on 25/11/2012 14:29 Attilio Rao said the following: >> I think the patch you propose makes such effects even worse, because >> it disables interrupts in generic_stop_cpus(). >> What I suggest to do, is the following: >> - The CPU which wins the race for generic_stop_cpus also signals the >> CPUs it is willing to stop on a global mask >> - Another CPU entering generic_stop_cpus() and loosing the race, >> checks the mask of cpus which might be stopped and stops itself if >> necessary (ie. not yet done). We must be careful with races here, but >> I'm confindent this can be done easily enough. > > I think that you either misunderstood my patch or I misunderstand your > suggestion, because my patch does exactly what you wrote above. The patch is someway incomplete: - I don't think that we need specific checks in cpustop_handler() (and if you have added them to prevent races, I don't think they are enough, see below) - setting of "stopping_cpus" map must happen atomically/before the stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL mask in the check - Did you consider the races about when a stop and restart request happen just after the CPU_ISSET() check? I think CPUs can deadlock there. - I'm very doubious about the spinlock_enter() stuff, I think I can just make the problem worse atm. However you are right, the concept of your patch is the same I really wanted to get, we maybe need to just lift it up a bit. In the while I also double-checked suspended_cpus and I don't think there are real showstoppers to have it in stopped_cpus map. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDVt8VRA4kQipT5Lm%2Bo2KRRum9NKWorfeAucwR=hJ0uDw>