From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 24 19:52:34 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 27A6FC03; Thu, 24 Jan 2013 19:52:34 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 00ABD138; Thu, 24 Jan 2013 19:52:33 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 5343FB94B; Thu, 24 Jan 2013 14:52:33 -0500 (EST) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: NMI watchdog functionality on Freebsd Date: Thu, 24 Jan 2013 11:11:01 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <1358894455.17521.YahooMailClassic@web181706.mail.ne1.yahoo.com> <5100142D.7040904@freebsd.org> <1358960253.32417.467.camel@revolution.hippie.lan> In-Reply-To: <1358960253.32417.467.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201301241111.01629.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 24 Jan 2013 14:52:33 -0500 (EST) Cc: Sushanth Rai , mjacob@freebsd.org, Ian Lepore X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2013 19:52:34 -0000 On Wednesday, January 23, 2013 11:57:33 am Ian Lepore wrote: > On Wed, 2013-01-23 at 08:47 -0800, Matthew Jacob wrote: > > On 1/23/2013 7:25 AM, John Baldwin wrote: > > > On Tuesday, January 22, 2013 5:40:55 pm Sushanth Rai wrote: > > >> Hi, > > >> > > >> Does freebsd have some functionality similar to Linux's NMI watchdog ? I'm > > > aware of ichwd driver, but that depends to WDT to be available in the > > > hardware. Even when it is available, BIOS needs to support a mechanism to > > > trigger a OS level recovery to get any useful information when system is > > > really wedged (with interrupt disabled) > > The principle purpose of a watchdog is to keep the system from hanging. > > Information is secondary. The ichwd driver can use the LPC part of ICH > > hardware that's been there since ICH version 4. I implemented this more > > fully at Panasas. The first importance is to keep the system from being > > hung. The next piece of information is to detect, on reboot, that a > > watchdog event occurred. Finally, trying to isolate why is good. > > > > This is equivalent to the tco_WDT stuff on Linux. It's not interrupt > > driven (it drives the reset line on the processor). > > > > I think there's value in the NMI watchdog idea, but unless you back it > up with a real hardware watchdog you don't really have full watchdog > functionality. If the NMI can get the OS to produce some extra info, > that's great, and using an NMI gives you a good chance of doing that > even if it is normal interrupt processing that has wedged the machine. > But calling panic() invokes plenty of processing that can get wedged in > other ways, so even an NMI-based watchdog isn't g'teed to get the > machine running again. > > But adding a real hardware watchdog that fires on a slightly longer > timeout than the NMI watchdog gives you the best of everything: you get > information if it's possible to produce it, and you get a real hardware > reset shortly thereafter if producing the info fails. The IPMI watchdog facility has support for a pre-interrupt that fires before the real watchdog. I have coded up support for it in a branch but haven't found any hardware that supports it that I could use to test them. However, you could use an NMI pre-timer via the local APIC timer as a generic pre-timer for other hardware watchdogs. -- John Baldwin