From owner-freebsd-current@freebsd.org Fri Aug 21 15:30:26 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0BB2E9BF8F2 for ; Fri, 21 Aug 2015 15:30:26 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CF75E107B; Fri, 21 Aug 2015 15:30:25 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (50-196-156-133-static.hfc.comcastbusiness.net [50.196.156.133]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id t7LFUFaj048110 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Fri, 21 Aug 2015 08:30:22 -0700 (PDT) (envelope-from julian@freebsd.org) Subject: Re: freebsd-head: suddenly NMI panics lead to ddb being unable to stop CPUs? To: Adrian Chadd References: <55D74193.4020008@FreeBSD.org> Cc: freebsd-current From: Julian Elischer Message-ID: <55D74402.70104@freebsd.org> Date: Fri, 21 Aug 2015 23:30:10 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Aug 2015 15:30:26 -0000 On 8/21/15 11:25 PM, Adrian Chadd wrote: > Ah, cool. I'll give it a whirl. > > I'm a little worried about having all of the other cores spinning in > this case (mostly thermal; the machines get VERY LOUD when the CPUs > are spinning..) > make each spin with the pause instruction.. and for N seconds (N being the CPU ID) or something > -a > > > On 21 August 2015 at 08:19, Eric van Gyzen wrote: >> I mentioned this to Adrian, but I'll mention here for everyone else's benefit. >> >> Ryan is exactly right. There was a thread a while ago, with a proposed patch from Kostik: >> >> https://lists.freebsd.org/pipermail/freebsd-arch/2014-July/015584.html >> >> As I recall, Scott Long also ran into this a few months ago. >> >> It happens for any NMI: entering the debugger, a PCI Parity or System Error, a hardware watchdog timeout, and probably other sources I'm not remembering. >> >> Eric >> >> On 08/21/2015 09:23, Ryan Stone wrote: >>> I have seen similar behaviour before. The problem is that every CPU >>> receives an NMI concurrently. As I recall, one of them gets some kind of >>> pseudo-spinlock and tries to stop the other CPUs with an NMI. However, >>> because they are already in an NMI handler, they don't get the second NMI >>> and don't stop properly. >>> >>> The case that I saw actually had to do with a panic triggered by an NMI, >>> not entering the debugger, but I believe that both cases use >>> stop_cpus_hard() under the hood and have a similar issue. >>> >>> (I also recall seeing the exact situation that you describe while >>> originally developing SR-IOV on an alpha version of the Fortville hardware >>> and firmware with a very buggy SR-IOV implementation. I've never seen it >>> on ixgbe before, although I haven't used SR-IOV there very much at all) >>> >>> >>> On Thu, Aug 20, 2015 at 6:15 PM, Adrian Chadd wrote: >>> >>>> Hi! >>>> >>>> This has started happening on -HEAD recently. No, I don't have any >>>> more details yet than "recently." >>>> >>>> Whenever I get an NMI panic (and getting an NMI is a separate issue, >>>> sigh) I get a slew of "failed to stop cpu" messages, and all CPUs >>>> enter ddb. This is .. sub-optimal. Has anyone seen this? Does anyone >>>> have any ideas? >>>> >>>> >>>> -adrian > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >