From owner-freebsd-current@freebsd.org  Fri Aug 21 15:25:28 2015
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2BCD39BF7F8;
 Fri, 21 Aug 2015 15:25:28 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com
 [IPv6:2607:f8b0:4001:c05::22b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id EA661D96;
 Fri, 21 Aug 2015 15:25:27 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: by igcse8 with SMTP id se8so2556526igc.1;
 Fri, 21 Aug 2015 08:25:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=X4uZtVPkb9LhtlE7ioIifO+YWVaMEHNjk+VElTGPoNA=;
 b=DUj+A+Ts9d6yzbvPjeK0WH6U2MnhoQZCgDwQnIepKzmK2wjhZoUjVB7qWfY4eMZkuO
 it9oemnPI5SkaJukI3kzvQehKh4dmunsJw4KROi2HA2+FiEW34OqkRMzyfM949OrSvYR
 tXYklNtq8I8ZamFASkElAdsP9UWfYgf+kygn2BnmpYY8dRR/vpbg5RXl+18F1HXHV25N
 R5HSb4NXPWU04SdH1KRNGtP0lU7izDqgqb/E0i+OLY1VJ8F3Y/fDJTrtZefQPMOrskfJ
 HDAXG2YP45KaZ9lPevhMJk6GSHoQvovVHsDuH5k3B1PhooNZHlfRsRjdx2G/fsbr80Me
 KfWA==
MIME-Version: 1.0
X-Received: by 10.50.128.169 with SMTP id np9mr3223564igb.37.1440170727275;
 Fri, 21 Aug 2015 08:25:27 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.36.38.133 with HTTP; Fri, 21 Aug 2015 08:25:27 -0700 (PDT)
In-Reply-To: <55D74193.4020008@FreeBSD.org>
References: <CAJ-VmomvqULP--v47qKJisQkf8VQNvxEhXK=HXEtv9MuLz4D1g@mail.gmail.com>
 <CAFMmRNw6tWMQ-pfXzSpEM7kRgKafB9KnK-oUhWw2_E-P91drLw@mail.gmail.com>
 <55D74193.4020008@FreeBSD.org>
Date: Fri, 21 Aug 2015 08:25:27 -0700
X-Google-Sender-Auth: NlJDAIsUGthreT7rUOt8WiaZWQU
Message-ID: <CAJ-Vmon6xXBSMPWgNhg-RZKLuuMDP1hvXG+DdZ3fZdvFnan06g@mail.gmail.com>
Subject: Re: freebsd-head: suddenly NMI panics lead to ddb being unable to
 stop CPUs?
From: Adrian Chadd <adrian@freebsd.org>
To: Eric van Gyzen <vangyzen@freebsd.org>
Cc: Ryan Stone <rysto32@gmail.com>,
 freebsd-current <freebsd-current@freebsd.org>, 
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 Scott Long <scottl@freebsd.org>, Konstantin Belousov <kib@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Aug 2015 15:25:28 -0000

Ah, cool. I'll give it a whirl.

I'm a little worried about having all of the other cores spinning in
this case (mostly thermal; the machines get VERY LOUD when the CPUs
are spinning..)


-a


On 21 August 2015 at 08:19, Eric van Gyzen <vangyzen@freebsd.org> wrote:
> I mentioned this to Adrian, but I'll mention here for everyone else's benefit.
>
> Ryan is exactly right.  There was a thread a while ago, with a proposed patch from Kostik:
>
> https://lists.freebsd.org/pipermail/freebsd-arch/2014-July/015584.html
>
> As I recall, Scott Long also ran into this a few months ago.
>
> It happens for any NMI:  entering the debugger, a PCI Parity or System Error, a hardware watchdog timeout, and probably other sources I'm not remembering.
>
> Eric
>
> On 08/21/2015 09:23, Ryan Stone wrote:
>> I have seen similar behaviour before.  The problem is that every CPU
>> receives an NMI concurrently.  As I recall, one of them gets some kind of
>> pseudo-spinlock and tries to stop the other CPUs with an NMI.  However,
>> because they are already in an NMI handler, they don't get the second NMI
>> and don't stop properly.
>>
>> The case that I saw actually had to do with a panic triggered by an NMI,
>> not entering the debugger, but I believe that both cases use
>> stop_cpus_hard() under the hood and have a similar issue.
>>
>> (I also recall seeing the exact situation that you describe while
>> originally developing SR-IOV on an alpha version of the Fortville hardware
>> and firmware with a very buggy SR-IOV implementation.  I've never seen it
>> on ixgbe before, although I haven't used SR-IOV there very much at all)
>>
>>
>> On Thu, Aug 20, 2015 at 6:15 PM, Adrian Chadd <adrian@freebsd.org> wrote:
>>
>>> Hi!
>>>
>>> This has started happening on -HEAD recently. No, I don't have any
>>> more details yet than "recently."
>>>
>>> Whenever I get an NMI panic (and getting an NMI is a separate issue,
>>> sigh) I get a slew of "failed to stop cpu" messages, and all CPUs
>>> enter ddb. This is .. sub-optimal. Has anyone seen this? Does anyone
>>> have any ideas?
>>>
>>>
>>> -adrian
>