Date: Sun, 25 Nov 2012 12:31:42 +0000 From: Attilio Rao <attilio@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-hackers@freebsd.org, Ryan Stone <rysto32@gmail.com> Subject: Re: stop_cpus_hard when multiple CPUs are panicking from an NMI Message-ID: <CAJ-FndBCAsuseg3iqejm5ck12hnffQbhgRxQ0FmCqiOamPETLA@mail.gmail.com> In-Reply-To: <CAJ-FndADxJtYPX2-cQnqJoLhzYtJMidG1DPPY%2B6Dtf4rVw_zrw@mail.gmail.com> References: <CAFMmRNwb_rxYXHGtXgtcyVUJnFDx5PSeMmA_crBbeV_rtzL9Cg@mail.gmail.com> <50A5F12C.1050902@FreeBSD.org> <CAJ-FndAB%2B7KRAE91L9634eXgzqgrizwtwCBC7AAg%2B0EX89TEBQ@mail.gmail.com> <50A63D1D.9090500@FreeBSD.org> <CAJ-FndDC1QCytXDJqVkism_5VoLNo_OzZxNEQ9NHx63HC=GTNg@mail.gmail.com> <50A65208.4050804@FreeBSD.org> <CAJ-FndADxJtYPX2-cQnqJoLhzYtJMidG1DPPY%2B6Dtf4rVw_zrw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 25, 2012 at 12:29 PM, Attilio Rao <attilio@freebsd.org> wrote: > On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon <avg@freebsd.org> wrote: >> on 16/11/2012 16:41 Attilio Rao said the following: >>> On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon <avg@freebsd.org> wrote: >>>> on 16/11/2012 14:30 Attilio Rao said the following: >>>>> On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon <avg@freebsd.org> wrote: >>>>>> on 16/11/2012 00:58 Ryan Stone said the following: >>>>>>> At work we have some custom watchdog hardware that sends an NMI upon >>>>>>> expiry. We've modified the kernel to panic when it receives the watchdog >>>>>>> NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've >>>>>>> discovered that when my watchdog expires, the system gets completely >>>>>>> wedged. After some digging, I've discovered is that I have multiple CPUs >>>>>>> getting the watchdog NMI and trying to panic concurrently. One of the CPUs >>>>>>> wins, and the rest spin forever in this code: >>>>>>> >>>>>>> /* >>>>>>> * We don't want multiple CPU's to panic at the same time, so we >>>>>>> * use panic_cpu as a simple spinlock. We have to keep checking >>>>>>> * panic_cpu if we are spinning in case the panic on the first >>>>>>> * CPU is canceled. >>>>>>> */ >>>>>>> if (panic_cpu != PCPU_GET(cpuid)) >>>>>>> while (atomic_cmpset_int(&panic_cpu, NOCPU, >>>>>>> PCPU_GET(cpuid)) == 0) >>>>>>> while (panic_cpu != NOCPU) >>>>>>> ; /* nothing */ >>>>>>> >>>>>>> The system wedges when stop_cpus_hard() is called, which sends NMIs to all >>>>>>> of the other CPUs and waits for them to acknowledge that they are stopped >>>>>>> before returning. However the CPU will not deliver an NMI to a CPU that is >>>>>>> already handling an NMI, so the other CPUs that got a watchdog NMI and are >>>>>>> spinning will never go into the NMI handler and acknowledge that they are >>>>>>> stopped. >>>>>> >>>>>> I thought about this issue and fixed (in my tree) in a different way: >>>>>> http://people.freebsd.org/~avg/cpu_stop-race.diff >>>>>> >>>>>> The need for spinlock_enter in the patch in not entirely clear. >>>>>> The main idea is that a CPU which calls cpu_stop and loses a race should >>>>>> voluntary enter cpustop_handler. >>>>>> I am also not sure about MI-cleanness of this patch. >>>>> >>>>> It is similar to what I propose but with some differences: >>>>> - It is not clean from MI perspective >>>> >>>> OK. >>>> >>>>> - I don't think we need to treact it specially, I would just >>>>> unconditionally stop all the CPUs entering in the "spinlock zone", >>>>> making the patch simpler. >>>> >>>> I couldn't understand this part. >>> >>> I'm sorry, I think I misread your patch. >>> >>> I was basically suggesting to fix Ryan case by calling >>> cpustop_handler() (or the new MI interface) into the panic() function, >>> in the case the CPU don't win the race for panic_cpu. >>> Basically doing: >>> Index: sys/kern/kern_shutdown.c >>> =================================================================== >>> --- sys/kern/kern_shutdown.c (revision 243154) >>> +++ sys/kern/kern_shutdown.c (working copy) >>> @@ -568,15 +568,11 @@ panic(const char *fmt, ...) >>> #ifdef SMP >>> /* >>> * We don't want multiple CPU's to panic at the same time, so we >>> - * use panic_cpu as a simple spinlock. We have to keep checking >>> - * panic_cpu if we are spinning in case the panic on the first >>> - * CPU is canceled. >>> + * use panic_cpu as a simple lock. >>> */ >>> if (panic_cpu != PCPU_GET(cpuid)) >>> - while (atomic_cmpset_int(&panic_cpu, NOCPU, >>> - PCPU_GET(cpuid)) == 0) >>> - while (panic_cpu != NOCPU) >>> - ; /* nothing */ >>> + if (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0) >>> + cpustop_handler(); >>> >>> if (stop_scheduler_on_panic) { >>> if (panicstr == NULL && !kdb_active) { >>> >>> >>> Infact it seems to me that the comment is outdated and no longer >>> represent truth. >> >> Ah, I see. Thank you. >> >> My older plan was to get rid of stop_scheduler_on_panic, that is to make the >> behavior unconditional. And then to use stop_cpus_hard instead of the hand-rolled >> 'panic_cpu' spinlock. This way the whichever CPU wins stop_cpus_hard would be the >> only CPU to enter panic. > > So, assuming we are not yet defaulting to stop cpus behaviour for > panic, I think we have 2 things to take care: > 1) Handling the simultaneous panics > 2) Handling deadlocks/starvation among simultaneous cpu_stops > > In particular, case 2 is more tricky than it seems, it is not only > related to NMI but also to the case where you send an IPI_STOP via a > normal IPI and interrupts are disabled on one of the channels. Infact, > I think the patch you propose makes such effects even worse, because > it disables interrupts in generic_stop_cpus(). > What I suggest to do, is the following: > - The CPU which wins the race for generic_stop_cpus also signals the > CPUs it is willing to stop on a global mask > - Another CPU entering generic_stop_cpus() and loosing the race, > checks the mask of cpus which might be stopped and stops itself if > necessary (ie. not yet done). We must be careful with races here, but > I'm confindent this can be done easily enough. BTW; I'm aware such case is not fatal because of the safety belt, but I think it would be too easy to get wrong code if we get simultaneous generic_stop_cpus() to race in the "wrong way" (race loser with interrupts/NMI disabled). Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndBCAsuseg3iqejm5ck12hnffQbhgRxQ0FmCqiOamPETLA>