Date: Thu, 20 Oct 2005 21:27:55 +0000 (GMT) From: wpaul@FreeBSD.ORG (Bill Paul) To: Emanuel.strobl@gmx.net (Emanuel Strobl) Cc: freebsd-current@freebsd.org Subject: Re: PANIC (watchdog) Message-ID: <20051020212755.5E19816A420@hub.freebsd.org> In-Reply-To: <200510202241.44590@harrymail> from Emanuel Strobl at "Oct 20, 2005 10:41:24 pm"
next in thread | previous in thread | raw e-mail | index | archive | help
> > I'm waiting for the part where you explain why you have software > > watchdog support in kern_clock.c turned on. As far as I know, it's > > not enabled by default in the GENERIC kernel config, and even if > > it is, you have to twiddle debug.watchdog_enable in order to make > > it trigger. > > All I twiddle is watchdogd_enable in /etc/rc.conf... ;) > > > I can only conclude that you turned on watchdog support for some > > reason, set up a watchdog app to reset the watchdog timeout every > > so often, and then forgot about it (or else it was enabled as part > > of some other change you made and you weren't aware of it -- maybe one > > of those unrelated things you foolishly chose not to tell us about). > > Presumeably the watchdog app crashed while you were doing your > > installworkd. If that's the case, you shouldn't be surprised when > > the watchdog expiration occurs and dumps you into ddb. > > Ok, then this panic is intended if I understood you correctly. I thought it > would trigger any kind of CPU reset, not a panic. > Everything is fine then... > > Thanks, > > -Harry Actually, it's not a panic (it's only a panic in a kernel without DDB compiled in). It really just does kdb_enter(), which brings you to the kernel debugger prompt. You should be able to resume the system like this: db> w watchdog_enabled 0 db> continue The idea is that once the watchdog is enabled, hardclock() will dump you into the kernel debugger _unless_ something resets the watchdog timer periodically. That something is a user space app which pokes debug.watchdog_reset. If the watchdog timeout is set for 20 seconds and you reset the timer every 10 seconds, the system will keep running. If the watchdog app dies, or if the kernel siezes up and stops scheduling user processes, the timer will reach 0 and the kernel debugger will come up. The watchdog is supposed to give you a way to debug thread deadlocks or stuck loops that occur in kernel mode. When such a condition arises, interrupts may still occur and be handled, but user processes never get a chance to run. This would stop the user space watchdog app, so eventually the watchdog timeout would expire. In a system without DDB, you'd get a panic instead of dropping into the kernel debugger. Obviously, this is useful if you have an unattended machine to which you have no easy console access: if the machine wedges, the watchdog will fire and reboot the system, which hopefull till bring it back to a working state and let you analyze and fix the problem remotely. Unfortunately, there are many cases where the watchdog won't work. Sometimes the system can wedge with interrupts disabled, or experience some kind of hardware fault that prevents even hardclock() from running. If that happens, you're screwed, unless you've rigged up some way to deliver an NMI that can force the CPU to trap into the debugger. -Bill -- ============================================================================= -Bill Paul (510) 749-2329 | Senior Engineer, Master of Unix-Fu wpaul@windriver.com | Wind River Systems ============================================================================= <adamw> you're just BEGGING to face the moose =============================================================================
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051020212755.5E19816A420>