From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 16 07:54:24 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id DCA7A971
 for <freebsd-hackers@FreeBSD.org>; Fri, 16 Nov 2012 07:54:24 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 02F468FC12
 for <freebsd-hackers@FreeBSD.org>; Fri, 16 Nov 2012 07:54:23 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA12848;
 Fri, 16 Nov 2012 09:54:21 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1TZGkn-00014Z-1O; Fri, 16 Nov 2012 09:54:21 +0200
Message-ID: <50A5F12C.1050902@FreeBSD.org>
Date: Fri, 16 Nov 2012 09:54:20 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:16.0) Gecko/20121030 Thunderbird/16.0.2
MIME-Version: 1.0
To: Ryan Stone <rysto32@gmail.com>
Subject: Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
References: <CAFMmRNwb_rxYXHGtXgtcyVUJnFDx5PSeMmA_crBbeV_rtzL9Cg@mail.gmail.com>
In-Reply-To: <CAFMmRNwb_rxYXHGtXgtcyVUJnFDx5PSeMmA_crBbeV_rtzL9Cg@mail.gmail.com>
X-Enigmail-Version: 1.4.5
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@FreeBSD.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Nov 2012 07:54:25 -0000

on 16/11/2012 00:58 Ryan Stone said the following:
> At work we have some custom watchdog hardware that sends an NMI upon
> expiry.  We've modified the kernel to panic when it receives the watchdog
> NMI.  I've been trying the "stop_scheduler_on_panic" mode, and I've
> discovered that when my watchdog expires, the system gets completely
> wedged.  After some digging, I've discovered is that I have multiple CPUs
> getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
> wins, and the rest spin forever in this code:
> 
> /*
>      * We don't want multiple CPU's to panic at the same time, so we
>      * use panic_cpu as a simple spinlock.  We have to keep checking
>      * panic_cpu if we are spinning in case the panic on the first
>      * CPU is canceled.
>      */
>     if (panic_cpu != PCPU_GET(cpuid))
>         while (atomic_cmpset_int(&panic_cpu, NOCPU,
>             PCPU_GET(cpuid)) == 0)
>             while (panic_cpu != NOCPU)
>                 ; /* nothing */
> 
> The system wedges when stop_cpus_hard() is called, which sends NMIs to all
> of the other CPUs and waits for them to acknowledge that they are stopped
> before returning.  However the CPU will not deliver an NMI to a CPU that is
> already handling an NMI, so the other CPUs that got a watchdog NMI and are
> spinning will never go into the NMI handler and acknowledge that they are
> stopped.

I thought about this issue and fixed (in my tree) in a different way:
http://people.freebsd.org/~avg/cpu_stop-race.diff

The need for spinlock_enter in the patch in not entirely clear.
The main idea is that a CPU which calls cpu_stop and loses a race should
voluntary enter cpustop_handler.
I am also not sure about MI-cleanness of this patch.

P.S.  I also have the hard stop and the "soft" stop separate in my tree.  Just
in case there is a "simultaneous" occurrence of the soft stop happening for
whatever reason and the hard stop for panic or kdb entry, I always want the hard
stop to override the soft stop.

> I've been able to work around this with the following hideous hack:
> 
> --- kern_shutdown.c     2012-08-17 10:25:02.000000000 -0400
> +++ kern_shutdown.c     2012-11-15 17:04:10.000000000 -0500
> @@ -658,11 +658,15 @@
>          * panic_cpu if we are spinning in case the panic on the first
>          * CPU is canceled.
>          */
> -       if (panic_cpu != PCPU_GET(cpuid))
> +       if (panic_cpu != PCPU_GET(cpuid)) {
>                 while (atomic_cmpset_int(&panic_cpu, NOCPU,
> -                   PCPU_GET(cpuid)) == 0)
> +                   PCPU_GET(cpuid)) == 0) {
> +                       atomic_set_int(&stopped_cpus, PCPU_GET(cpumask));
>                         while (panic_cpu != NOCPU)
>                                 ; /* nothing */
> +               }
> +               atomic_clear_int(&stopped_cpus, PCPU_GET(cpumask));
> +       }
> 
>         if (stop_scheduler_on_panic) {
>                 if (panicstr == NULL && !kdb_active)
> 
> 
> But I'm hoping that somebody has some ideas on a better way to fix this
> kind of problem.


-- 
Andriy Gapon