From owner-freebsd-hackers@FreeBSD.ORG Sun Nov 25 12:31:44 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 90C5DAA8; Sun, 25 Nov 2012 12:31:44 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id CA88C8FC08; Sun, 25 Nov 2012 12:31:43 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id go10so7221972lbb.13 for ; Sun, 25 Nov 2012 04:31:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=vzBEDAiKuch/DWCqYTiKTAePuI9Q1kbrsGdWeGGc94Y=; b=vkATVCwe9vr1Tu95wiyMKel1/N/CfzAh3vtDuDEsmwa2sZ2EQ4RCzF2tXEIW5bnkbM i1enY7ke2whpTp2+WiCKZpihdOiK4NAOBa16TQxaWKGjT6aSXXR2faK6coj+XjDek6Of jHqwshULy3Bg+oDrDNXvoEgeHL3nGOdQnqsPuAD7UUez51UstlQJFbD0Z3P64Voi+BbX wv+/APprBwFnlqY3xZ/HIfNro+p/MLmMa0fF0PKgTZffvyHe4T9/iems22dnB3KqwWDD JcZ8WDChGMkJTfLOaBPn7blIKgK2sIiDkxKqLsdGzgiT4+hirk+nmQD/4FONWJQopL9L Mblw== MIME-Version: 1.0 Received: by 10.152.104.50 with SMTP id gb18mr8304404lab.9.1353846702535; Sun, 25 Nov 2012 04:31:42 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.134.5 with HTTP; Sun, 25 Nov 2012 04:31:42 -0800 (PST) In-Reply-To: References: <50A5F12C.1050902@FreeBSD.org> <50A63D1D.9090500@FreeBSD.org> <50A65208.4050804@FreeBSD.org> Date: Sun, 25 Nov 2012 12:31:42 +0000 X-Google-Sender-Auth: RoDkbfyVsG3cSImyS_LaZLDvzjk Message-ID: Subject: Re: stop_cpus_hard when multiple CPUs are panicking from an NMI From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: freebsd-hackers@freebsd.org, Ryan Stone X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Nov 2012 12:31:44 -0000 On Sun, Nov 25, 2012 at 12:29 PM, Attilio Rao wrote: > On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon wrote: >> on 16/11/2012 16:41 Attilio Rao said the following: >>> On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon wrote: >>>> on 16/11/2012 14:30 Attilio Rao said the following: >>>>> On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon wrote: >>>>>> on 16/11/2012 00:58 Ryan Stone said the following: >>>>>>> At work we have some custom watchdog hardware that sends an NMI upon >>>>>>> expiry. We've modified the kernel to panic when it receives the watchdog >>>>>>> NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've >>>>>>> discovered that when my watchdog expires, the system gets completely >>>>>>> wedged. After some digging, I've discovered is that I have multiple CPUs >>>>>>> getting the watchdog NMI and trying to panic concurrently. One of the CPUs >>>>>>> wins, and the rest spin forever in this code: >>>>>>> >>>>>>> /* >>>>>>> * We don't want multiple CPU's to panic at the same time, so we >>>>>>> * use panic_cpu as a simple spinlock. We have to keep checking >>>>>>> * panic_cpu if we are spinning in case the panic on the first >>>>>>> * CPU is canceled. >>>>>>> */ >>>>>>> if (panic_cpu != PCPU_GET(cpuid)) >>>>>>> while (atomic_cmpset_int(&panic_cpu, NOCPU, >>>>>>> PCPU_GET(cpuid)) == 0) >>>>>>> while (panic_cpu != NOCPU) >>>>>>> ; /* nothing */ >>>>>>> >>>>>>> The system wedges when stop_cpus_hard() is called, which sends NMIs to all >>>>>>> of the other CPUs and waits for them to acknowledge that they are stopped >>>>>>> before returning. However the CPU will not deliver an NMI to a CPU that is >>>>>>> already handling an NMI, so the other CPUs that got a watchdog NMI and are >>>>>>> spinning will never go into the NMI handler and acknowledge that they are >>>>>>> stopped. >>>>>> >>>>>> I thought about this issue and fixed (in my tree) in a different way: >>>>>> http://people.freebsd.org/~avg/cpu_stop-race.diff >>>>>> >>>>>> The need for spinlock_enter in the patch in not entirely clear. >>>>>> The main idea is that a CPU which calls cpu_stop and loses a race should >>>>>> voluntary enter cpustop_handler. >>>>>> I am also not sure about MI-cleanness of this patch. >>>>> >>>>> It is similar to what I propose but with some differences: >>>>> - It is not clean from MI perspective >>>> >>>> OK. >>>> >>>>> - I don't think we need to treact it specially, I would just >>>>> unconditionally stop all the CPUs entering in the "spinlock zone", >>>>> making the patch simpler. >>>> >>>> I couldn't understand this part. >>> >>> I'm sorry, I think I misread your patch. >>> >>> I was basically suggesting to fix Ryan case by calling >>> cpustop_handler() (or the new MI interface) into the panic() function, >>> in the case the CPU don't win the race for panic_cpu. >>> Basically doing: >>> Index: sys/kern/kern_shutdown.c >>> =================================================================== >>> --- sys/kern/kern_shutdown.c (revision 243154) >>> +++ sys/kern/kern_shutdown.c (working copy) >>> @@ -568,15 +568,11 @@ panic(const char *fmt, ...) >>> #ifdef SMP >>> /* >>> * We don't want multiple CPU's to panic at the same time, so we >>> - * use panic_cpu as a simple spinlock. We have to keep checking >>> - * panic_cpu if we are spinning in case the panic on the first >>> - * CPU is canceled. >>> + * use panic_cpu as a simple lock. >>> */ >>> if (panic_cpu != PCPU_GET(cpuid)) >>> - while (atomic_cmpset_int(&panic_cpu, NOCPU, >>> - PCPU_GET(cpuid)) == 0) >>> - while (panic_cpu != NOCPU) >>> - ; /* nothing */ >>> + if (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0) >>> + cpustop_handler(); >>> >>> if (stop_scheduler_on_panic) { >>> if (panicstr == NULL && !kdb_active) { >>> >>> >>> Infact it seems to me that the comment is outdated and no longer >>> represent truth. >> >> Ah, I see. Thank you. >> >> My older plan was to get rid of stop_scheduler_on_panic, that is to make the >> behavior unconditional. And then to use stop_cpus_hard instead of the hand-rolled >> 'panic_cpu' spinlock. This way the whichever CPU wins stop_cpus_hard would be the >> only CPU to enter panic. > > So, assuming we are not yet defaulting to stop cpus behaviour for > panic, I think we have 2 things to take care: > 1) Handling the simultaneous panics > 2) Handling deadlocks/starvation among simultaneous cpu_stops > > In particular, case 2 is more tricky than it seems, it is not only > related to NMI but also to the case where you send an IPI_STOP via a > normal IPI and interrupts are disabled on one of the channels. Infact, > I think the patch you propose makes such effects even worse, because > it disables interrupts in generic_stop_cpus(). > What I suggest to do, is the following: > - The CPU which wins the race for generic_stop_cpus also signals the > CPUs it is willing to stop on a global mask > - Another CPU entering generic_stop_cpus() and loosing the race, > checks the mask of cpus which might be stopped and stops itself if > necessary (ie. not yet done). We must be careful with races here, but > I'm confindent this can be done easily enough. BTW; I'm aware such case is not fatal because of the safety belt, but I think it would be too easy to get wrong code if we get simultaneous generic_stop_cpus() to race in the "wrong way" (race loser with interrupts/NMI disabled). Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein