From owner-freebsd-arch@freebsd.org Fri Aug 21 14:23:37 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 464CF9BFB3C; Fri, 21 Aug 2015 14:23:37 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-ig0-x22c.google.com (mail-ig0-x22c.google.com [IPv6:2607:f8b0:4001:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 121E71F6; Fri, 21 Aug 2015 14:23:37 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by igcse8 with SMTP id se8so1064326igc.1; Fri, 21 Aug 2015 07:23:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=8XpdXy5a+XfqrUSNmA5faFNfkPrxj2fATsWUZ3hRh6c=; b=JYaH0kDUSuAZCaGfhXL6s7u+fBelpNf9Yc4Pcba1IF2e3YEmYbvUCn+hL/LiVbWTL0 LWqpJoIsbeYl/5kGRtm/sF76NTA2IQMpMaPa2bATWO50lPzn2JXmjIfhSn/+r8jK6fu/ lWqAsUN6CGDWK1uM/5kA1/oydup6GhDoYtS7cPECcddhPlX3SxUSMmuH6bVSBX52jvNr Q0vBm0TdSWfcrzykHi/RTefvj5sve48UFe4OjFhxsCoTwDv1UUebkE8Q2vby/gEw94yT C2czBUqNlG5So5QCBzGbOG4fdB/0fMcgDiHSE1V097noMbtf5VkurU9UbrjmM3jYOWqB zpoQ== MIME-Version: 1.0 X-Received: by 10.50.124.4 with SMTP id me4mr3174071igb.34.1440167016204; Fri, 21 Aug 2015 07:23:36 -0700 (PDT) Received: by 10.107.169.94 with HTTP; Fri, 21 Aug 2015 07:23:36 -0700 (PDT) In-Reply-To: References: Date: Fri, 21 Aug 2015 10:23:36 -0400 Message-ID: Subject: Re: freebsd-head: suddenly NMI panics lead to ddb being unable to stop CPUs? From: Ryan Stone To: Adrian Chadd Cc: "freebsd-arch@freebsd.org" , freebsd-current Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Aug 2015 14:23:37 -0000 I have seen similar behaviour before. The problem is that every CPU receives an NMI concurrently. As I recall, one of them gets some kind of pseudo-spinlock and tries to stop the other CPUs with an NMI. However, because they are already in an NMI handler, they don't get the second NMI and don't stop properly. The case that I saw actually had to do with a panic triggered by an NMI, not entering the debugger, but I believe that both cases use stop_cpus_hard() under the hood and have a similar issue. (I also recall seeing the exact situation that you describe while originally developing SR-IOV on an alpha version of the Fortville hardware and firmware with a very buggy SR-IOV implementation. I've never seen it on ixgbe before, although I haven't used SR-IOV there very much at all) On Thu, Aug 20, 2015 at 6:15 PM, Adrian Chadd wrote: > Hi! > > This has started happening on -HEAD recently. No, I don't have any > more details yet than "recently." > > Whenever I get an NMI panic (and getting an NMI is a separate issue, > sigh) I get a slew of "failed to stop cpu" messages, and all CPUs > enter ddb. This is .. sub-optimal. Has anyone seen this? Does anyone > have any ideas? > > > -adrian > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >