Date: Tue, 12 Feb 2013 17:17:56 -0800 From: Alfred Perlstein <bright@mu.org> To: "arch@freebsd.org" <arch@freebsd.org>, Poul-Henning Kamp <phk@phk.freebsd.dk> Subject: request for preliminary review, enhanced watchdog. Message-ID: <511AE9C4.4030301@mu.org>
next in thread | raw e-mail | index | archive | help
At work we've had some issues with superfluous watchdog timeouts firing. Since we use an ipmi/external watchdog the system is completely reset and we are unable to gather metrics. I investigated the issue and then compared to what is offered by Linux and decided to crib from their API such that we can benefit from an enhanced watchdog. I have a WIP at this time in a branch that I would hope people could weigh in on and review as well as make technical suggestions. The branch is located here: svn+ssh://svn.freebsd.org/base/user/alfred/ewatchdog The easy way to get changes: svn log --stop-on-copy svn+ssh://svn.freebsd.org/base/user/alfred/ewatchdog 1) Support for pre-watchdog timeout. This means that so long as the kernel is somewhat functional (callouts are working) we can trigger a configurable action (panic,ddb,log) if the watchdog program is otherwise hung. 2) Support for built-in software watchdog that has the same options (panic,ddb,log) if the watchdog times out. This is useful for prototyping and was done instead of using the SW_WATCHDOG in kern_clock.c because of the ease of working the code into watchdog.c versus communication via the EVENTHANDLER api. 3) Support for Linux-like API. (WDIOC_GETTIMELEFT, WDIOC_SETTIMEOUT,WDIOC_GETTIMEOUT, etc) 4) Modifications to watchdogd(8): - Warn if the watchdog program takes too long. - Disable activation of the system watchdog so that one can test the watchdogd script without potentially rebooting the system. - Ability to log to syslog when scripts begin to timeout. - When told to measure time, do not unconditionally nap for 'sleep' seconds, instead adjust the naptime by the elapsed time so as not to trigger the watchdog. I've not yet hooked in the optional pre-timeout code into watchdogd(8) but plan on doing so later in the week. It would be really helpful if we could decide on a way of selecting which watchdogs to arm/fire and how to query them. I may adopt the Linux API unless someone has alternative suggestions that make a strong enough case to forge our own API. thank you, -Alfred
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?511AE9C4.4030301>