From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 15 23:41:33 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E76DF773 for ; Thu, 15 Nov 2012 23:41:33 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 5F4798FC0C for ; Thu, 15 Nov 2012 23:41:33 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id gg13so2184583lbb.13 for ; Thu, 15 Nov 2012 15:41:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=Zf9hzMcSg6P63ax02IyPM6xeiGJ3rJHJlrFRCPZUDnA=; b=hKYRss1A26vNDjyYk9mXf15MQsJx6QN8VUO43YZO8e4FNHWmAdydh0+qvZVh5uzS30 aLk/mVNZbQ03UP4BorhKgnoTXUCbOlKfw4Yn7QGM9HPA+3txzDp94Oeh9XKflU2eUQxW e4h39Y/Jfna1hx7xmrmaPNBTnoSSmQETwt4t/f0sw2vHnORC+3GuIvkfotX06C2mf1MU MOwrSXR+/Dr7u6/vcEFHbyMBSX0FZa4oybwSLQyoomg+S6DomvioeWUOw+omjJEL0zJH KGUI9zDXT3hA53dTWhhDaZNTgzTryaxHsO6nPdT2UECxiVxEMA0rM7W3RurGtM1/Kfun 6lYA== MIME-Version: 1.0 Received: by 10.152.123.103 with SMTP id lz7mr2599492lab.21.1353022891783; Thu, 15 Nov 2012 15:41:31 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.134.5 with HTTP; Thu, 15 Nov 2012 15:41:31 -0800 (PST) In-Reply-To: References: Date: Thu, 15 Nov 2012 23:41:31 +0000 X-Google-Sender-Auth: W-cETsyUMi5rNuW8RlyAKUJ9B1o Message-ID: Subject: Re: stop_cpus_hard when multiple CPUs are panicking from an NMI From: Attilio Rao To: Ryan Stone Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2012 23:41:34 -0000 On Thu, Nov 15, 2012 at 10:58 PM, Ryan Stone wrote: > At work we have some custom watchdog hardware that sends an NMI upon > expiry. We've modified the kernel to panic when it receives the watchdog > NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've > discovered that when my watchdog expires, the system gets completely > wedged. After some digging, I've discovered is that I have multiple CPUs > getting the watchdog NMI and trying to panic concurrently. One of the CPUs > wins, and the rest spin forever in this code: Quick question: can you control the way your watchdog sends the NMI? Like only to BSP rather than broadcast, etc. This is tied to the very unique situation that you cannot really deliver the (second) NMI. Attilio -- Peace can only be achieved by understanding - A. Einstein