Date: Wed, 23 Jan 2013 09:57:33 -0700 From: Ian Lepore <ian@FreeBSD.org> To: mjacob@FreeBSD.org Cc: freebsd-hackers@FreeBSD.org, Sushanth Rai <sushanth_rai@yahoo.com> Subject: Re: NMI watchdog functionality on Freebsd Message-ID: <1358960253.32417.467.camel@revolution.hippie.lan> In-Reply-To: <5100142D.7040904@freebsd.org> References: <1358894455.17521.YahooMailClassic@web181706.mail.ne1.yahoo.com> <201301231025.41118.jhb@freebsd.org> <5100142D.7040904@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2013-01-23 at 08:47 -0800, Matthew Jacob wrote: > On 1/23/2013 7:25 AM, John Baldwin wrote: > > On Tuesday, January 22, 2013 5:40:55 pm Sushanth Rai wrote: > >> Hi, > >> > >> Does freebsd have some functionality similar to Linux's NMI watchdog ? I'm > > aware of ichwd driver, but that depends to WDT to be available in the > > hardware. Even when it is available, BIOS needs to support a mechanism to > > trigger a OS level recovery to get any useful information when system is > > really wedged (with interrupt disabled) > The principle purpose of a watchdog is to keep the system from hanging. > Information is secondary. The ichwd driver can use the LPC part of ICH > hardware that's been there since ICH version 4. I implemented this more > fully at Panasas. The first importance is to keep the system from being > hung. The next piece of information is to detect, on reboot, that a > watchdog event occurred. Finally, trying to isolate why is good. > > This is equivalent to the tco_WDT stuff on Linux. It's not interrupt > driven (it drives the reset line on the processor). > I think there's value in the NMI watchdog idea, but unless you back it up with a real hardware watchdog you don't really have full watchdog functionality. If the NMI can get the OS to produce some extra info, that's great, and using an NMI gives you a good chance of doing that even if it is normal interrupt processing that has wedged the machine. But calling panic() invokes plenty of processing that can get wedged in other ways, so even an NMI-based watchdog isn't g'teed to get the machine running again. But adding a real hardware watchdog that fires on a slightly longer timeout than the NMI watchdog gives you the best of everything: you get information if it's possible to produce it, and you get a real hardware reset shortly thereafter if producing the info fails. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1358960253.32417.467.camel>