Date: Fri, 15 Nov 2002 19:21:08 -0500 (EST) From: Jeff Roberson <jroberson@chesapeake.net> To: arch@freebsd.org Subject: Software Watchdog Message-ID: <20021115191632.U22491-100000@mail.chesapeake.net>
next in thread | raw e-mail | index | archive | help
Sean Kelly has implemented a software watchdog based on input from myself and Peter. This works through a simple watcdog daemon that checks in with the kernel every so often. The kernel complains via hardclock() if the watchdog times out. This will be very useful for debugging hard lockups because hardclock() comes in through a fast intr. There are few things that will stop hardclock() from firing. Below I have included some snipits from an email Sean sent me. Here's what I've got so far: 1. Kernel watchdog a. Three sysctls i. debug.watchdog.timeout: Number of seconds allowed to go without a reset ii. debug.watchdog.reset: Upon read or write, resets the watchdog timer iii. debug.watchdog.enabled: When >0, perform watchdog checks. b. 'options WATCHDOG' or 'options INVARIANTS' to compile with watchdog code c. watchdog(4) manpage 2. Userland support a. /usr/sbin/watchdogd i. Performs stat("/etc") test ii. Awakens periodically and resets watchdog via d.w.reset sysctl iii. Sets d.w.enabled=1 on start and d.w.enabled=0 on exit. iv. Proper signal handling. v. Writes pidfile in /var/run/watchdogd.pid b. watchdogd(8) manpage c. /etc/rc check for watchdogd_enabled="YES" d. /etc/rc.d/watchdogd rcNG script e. Addition of 'watchdogd_enabled="NO"' to /etc/defaults/rc.conf I have a short TODO list as well: * Deal with when ticks overflows (this will be pretty easy) * Do multiple instances of interrupt and backtrace outputs a few seconds apart. (This will be pretty easy) * Flesh out the watchdogd daemon to do more checks once I figure out what checks people advise it do. And by checks, I mean "test a, b, and c must not fail or I won't reset the watchdog." What I have so far is available for viewing at http://www.zombie.org/watchdog.diff I believe this functionality will be invaluable for debugging 5.0. I'd like to have this included as soon as the todo list is covered and it gets a proper review. Comments? Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021115191632.U22491-100000>