From owner-freebsd-hackers@FreeBSD.ORG  Sun Nov 25 12:31:44 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 90C5DAA8;
 Sun, 25 Nov 2012 12:31:44 +0000 (UTC)
 (envelope-from asmrookie@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
 [209.85.217.182])
 by mx1.freebsd.org (Postfix) with ESMTP id CA88C8FC08;
 Sun, 25 Nov 2012 12:31:43 +0000 (UTC)
Received: by mail-lb0-f182.google.com with SMTP id go10so7221972lbb.13
 for <multiple recipients>; Sun, 25 Nov 2012 04:31:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=vzBEDAiKuch/DWCqYTiKTAePuI9Q1kbrsGdWeGGc94Y=;
 b=vkATVCwe9vr1Tu95wiyMKel1/N/CfzAh3vtDuDEsmwa2sZ2EQ4RCzF2tXEIW5bnkbM
 i1enY7ke2whpTp2+WiCKZpihdOiK4NAOBa16TQxaWKGjT6aSXXR2faK6coj+XjDek6Of
 jHqwshULy3Bg+oDrDNXvoEgeHL3nGOdQnqsPuAD7UUez51UstlQJFbD0Z3P64Voi+BbX
 wv+/APprBwFnlqY3xZ/HIfNro+p/MLmMa0fF0PKgTZffvyHe4T9/iems22dnB3KqwWDD
 JcZ8WDChGMkJTfLOaBPn7blIKgK2sIiDkxKqLsdGzgiT4+hirk+nmQD/4FONWJQopL9L
 Mblw==
MIME-Version: 1.0
Received: by 10.152.104.50 with SMTP id gb18mr8304404lab.9.1353846702535; Sun,
 25 Nov 2012 04:31:42 -0800 (PST)
Sender: asmrookie@gmail.com
Received: by 10.112.134.5 with HTTP; Sun, 25 Nov 2012 04:31:42 -0800 (PST)
In-Reply-To: <CAJ-FndADxJtYPX2-cQnqJoLhzYtJMidG1DPPY+6Dtf4rVw_zrw@mail.gmail.com>
References: <CAFMmRNwb_rxYXHGtXgtcyVUJnFDx5PSeMmA_crBbeV_rtzL9Cg@mail.gmail.com>
 <50A5F12C.1050902@FreeBSD.org>
 <CAJ-FndAB+7KRAE91L9634eXgzqgrizwtwCBC7AAg+0EX89TEBQ@mail.gmail.com>
 <50A63D1D.9090500@FreeBSD.org>
 <CAJ-FndDC1QCytXDJqVkism_5VoLNo_OzZxNEQ9NHx63HC=GTNg@mail.gmail.com>
 <50A65208.4050804@FreeBSD.org>
 <CAJ-FndADxJtYPX2-cQnqJoLhzYtJMidG1DPPY+6Dtf4rVw_zrw@mail.gmail.com>
Date: Sun, 25 Nov 2012 12:31:42 +0000
X-Google-Sender-Auth: RoDkbfyVsG3cSImyS_LaZLDvzjk
Message-ID: <CAJ-FndBCAsuseg3iqejm5ck12hnffQbhgRxQ0FmCqiOamPETLA@mail.gmail.com>
Subject: Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
From: Attilio Rao <attilio@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-hackers@freebsd.org, Ryan Stone <rysto32@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: attilio@FreeBSD.org
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Nov 2012 12:31:44 -0000

On Sun, Nov 25, 2012 at 12:29 PM, Attilio Rao <attilio@freebsd.org> wrote:
> On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon <avg@freebsd.org> wrote:
>> on 16/11/2012 16:41 Attilio Rao said the following:
>>> On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon <avg@freebsd.org> wrote:
>>>> on 16/11/2012 14:30 Attilio Rao said the following:
>>>>> On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon <avg@freebsd.org> wrote:
>>>>>> on 16/11/2012 00:58 Ryan Stone said the following:
>>>>>>> At work we have some custom watchdog hardware that sends an NMI upon
>>>>>>> expiry.  We've modified the kernel to panic when it receives the watchdog
>>>>>>> NMI.  I've been trying the "stop_scheduler_on_panic" mode, and I've
>>>>>>> discovered that when my watchdog expires, the system gets completely
>>>>>>> wedged.  After some digging, I've discovered is that I have multiple CPUs
>>>>>>> getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
>>>>>>> wins, and the rest spin forever in this code:
>>>>>>>
>>>>>>> /*
>>>>>>>      * We don't want multiple CPU's to panic at the same time, so we
>>>>>>>      * use panic_cpu as a simple spinlock.  We have to keep checking
>>>>>>>      * panic_cpu if we are spinning in case the panic on the first
>>>>>>>      * CPU is canceled.
>>>>>>>      */
>>>>>>>     if (panic_cpu != PCPU_GET(cpuid))
>>>>>>>         while (atomic_cmpset_int(&panic_cpu, NOCPU,
>>>>>>>             PCPU_GET(cpuid)) == 0)
>>>>>>>             while (panic_cpu != NOCPU)
>>>>>>>                 ; /* nothing */
>>>>>>>
>>>>>>> The system wedges when stop_cpus_hard() is called, which sends NMIs to all
>>>>>>> of the other CPUs and waits for them to acknowledge that they are stopped
>>>>>>> before returning.  However the CPU will not deliver an NMI to a CPU that is
>>>>>>> already handling an NMI, so the other CPUs that got a watchdog NMI and are
>>>>>>> spinning will never go into the NMI handler and acknowledge that they are
>>>>>>> stopped.
>>>>>>
>>>>>> I thought about this issue and fixed (in my tree) in a different way:
>>>>>> http://people.freebsd.org/~avg/cpu_stop-race.diff
>>>>>>
>>>>>> The need for spinlock_enter in the patch in not entirely clear.
>>>>>> The main idea is that a CPU which calls cpu_stop and loses a race should
>>>>>> voluntary enter cpustop_handler.
>>>>>> I am also not sure about MI-cleanness of this patch.
>>>>>
>>>>> It is similar to what I propose but with some differences:
>>>>> - It is not clean from MI perspective
>>>>
>>>> OK.
>>>>
>>>>> - I don't think we need to treact it specially, I would just
>>>>> unconditionally stop all the CPUs entering in the "spinlock zone",
>>>>> making the patch simpler.
>>>>
>>>> I couldn't understand this part.
>>>
>>> I'm sorry, I think I misread your patch.
>>>
>>> I was basically suggesting to fix Ryan case by calling
>>> cpustop_handler() (or the new MI interface) into the panic() function,
>>> in the case the CPU don't win the race for panic_cpu.
>>> Basically doing:
>>> Index: sys/kern/kern_shutdown.c
>>> ===================================================================
>>> --- sys/kern/kern_shutdown.c    (revision 243154)
>>> +++ sys/kern/kern_shutdown.c    (working copy)
>>> @@ -568,15 +568,11 @@ panic(const char *fmt, ...)
>>>  #ifdef SMP
>>>         /*
>>>          * We don't want multiple CPU's to panic at the same time, so we
>>> -        * use panic_cpu as a simple spinlock.  We have to keep checking
>>> -        * panic_cpu if we are spinning in case the panic on the first
>>> -        * CPU is canceled.
>>> +        * use panic_cpu as a simple lock.
>>>          */
>>>         if (panic_cpu != PCPU_GET(cpuid))
>>> -               while (atomic_cmpset_int(&panic_cpu, NOCPU,
>>> -                   PCPU_GET(cpuid)) == 0)
>>> -                       while (panic_cpu != NOCPU)
>>> -                               ; /* nothing */
>>> +               if (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0)
>>> +                       cpustop_handler();
>>>
>>>         if (stop_scheduler_on_panic) {
>>>                 if (panicstr == NULL && !kdb_active) {
>>>
>>>
>>> Infact it seems to me that the comment is outdated and no longer
>>> represent truth.
>>
>> Ah, I see.  Thank you.
>>
>> My older plan was to get rid of stop_scheduler_on_panic, that is to make the
>> behavior unconditional.  And then to use stop_cpus_hard instead of the hand-rolled
>> 'panic_cpu' spinlock.  This way the whichever CPU wins stop_cpus_hard would be the
>> only CPU to enter panic.
>
> So, assuming we are not yet defaulting to stop cpus behaviour for
> panic, I think we have 2 things to take care:
> 1) Handling the simultaneous panics
> 2) Handling deadlocks/starvation among simultaneous cpu_stops
>
> In particular, case 2 is more tricky than it seems, it is not only
> related to NMI but also to the case where you send an IPI_STOP via a
> normal IPI and interrupts are disabled on one of the channels. Infact,
> I think the patch you propose makes such effects even worse, because
> it disables interrupts in generic_stop_cpus().
> What I suggest to do, is the following:
> - The CPU which wins the race for generic_stop_cpus also signals the
> CPUs it is willing to stop on a global mask
> - Another CPU entering generic_stop_cpus() and loosing the race,
> checks the mask of cpus which might be stopped and stops itself if
> necessary (ie. not yet done). We must be careful with races here, but
> I'm confindent this can be done easily enough.

BTW; I'm aware such case is not fatal because of the safety belt, but
I think it would be too easy to get wrong code if we get simultaneous
generic_stop_cpus() to race in the "wrong way" (race loser with
interrupts/NMI disabled).

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein