Date: Thu, 2 Apr 2009 16:16:34 -0700 (PDT) From: Doug Ambrisko <ambrisko@ambrisko.com> To: Andriy Gapon <avg@icyb.net.ua> Cc: freebsd-hackers@freebsd.org Subject: Re: watchdog: hw+sw? Message-ID: <200904022316.n32NGYWK015340@ambrisko.com> In-Reply-To: <49D4A16F.6020906@icyb.net.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Andriy Gapon writes: | I have some vague thoughts on using SW_WATCHDOG and a hardware watchdog | together. | I think this could be useful but I am not sure how to implement this. | The idea is this: timeout for SW_WATCHDOG is smaller than timeout for hw | wd; when some freeze happens sw wd logic kicks in first, stops hw wd and | produces either panic or ddb prompt; if the freeze is so severe that sw | wd can't run (e.g. hardware is messed up badly) then hw wd performs its | duty. I am mostly interested in having this in unattended mode where kernel | dump could be useful for later analysis but the system should recover in | reasonable time. | | Suggestions, opinions? At prior company I implemented a watchdog before watchdog(4) that did this. I used the HW watchdog to register with the SW watchdog. Then our SW watchdog was ticked via a syctl count down. This way we could implement a fairly arbitrary range of time-outs since most HW is very limited in the time duration and then we didn't really have to worry about it. If the SW watchdog didn't tick in a 10 seconds or so then the machine is probably dead. So we used the HW watchdog to enforce the SW watchdog. It's really nice getting the panic and dump. This worked well for us so I think it is a good idea. Also some HW watchdogs can be told to generate an NMI which can also produce a kernel dump/ddb prompt. I've also implemented some rough code to put an simplified back-trace into the IPMI event log in-case a disk or disk I/O sub-system died. Doug A.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200904022316.n32NGYWK015340>