Date: Sun, 15 Aug 2004 15:46:18 -0400 (EDT) From: Robert Watson <rwatson@FreeBSD.org> To: current@FreeBSD.org Subject: MP watchdog (or: I have a dual-xeon with processors to burn) Message-ID: <Pine.NEB.3.96L.1040815153617.30898P-100000@fledge.watson.org>
next in thread | raw e-mail | index | archive | help
I've just committed a hack I've been using over the last day or two to debug hangs. It's hardly perfect, but it is sort of neat. Basically, it allows you to allocate a CPU on an SMP system as a watchdog to kick you into the debugger if there's a hang, even if it's spinning in sched_lock or the like. It can either fire an NMI at the boot processor, or invoke the debugger directly. I've included a sample "be nasty" sysctl that attempts to cause a nasty hang which the debugger is capable of breaking into. Note that the current SMP hang I'm experiencing resists this technique, but it's a useful one regardless, and is a decent substitute for having an NMI button. And it's a useful use for that fourth logical processor on a dual Xeon... :-) You can add MP_WATCHDOG to your i386 conf file, select SCHED_4BSD as the scheduler, and use the debug.watchdog sysctl to set a debugging CPU (I'll usually set it to 3 on my box). In ps(1) you'll see the idle thread on that CPU rename to a watchdog thread. Due to interrupt round-robining and some IPI's, there will be situations where the watchdog CPU does other things than watch, but it seems to do that in few enough situations that this is useful for a broad range of debugging. Obviously, you lose utilization of the CPU for the duration of having the watchdog enabled. Note: This does not work with sched_ule, only sched_4bsd. I'll work on fixing that at some point, but I'm still chasing the current stability problems. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research ---------- Forwarded message ---------- Date: Sun, 15 Aug 2004 18:02:10 +0000 (UTC) From: Robert Watson <rwatson@FreeBSD.org> To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org Subject: cvs commit: src/sys/conf files.i386 options.i386 src/sys/i386/i386 mp_machdep.c mp_watchdog.c src/sys/i386/include mp_watchdog.h rwatson 2004-08-15 18:02:10 UTC FreeBSD src repository Modified files: sys/conf files.i386 options.i386 sys/i386/i386 mp_machdep.c Added files: sys/i386/i386 mp_watchdog.c sys/i386/include mp_watchdog.h Log: Add an "options MP_WATCHDOG" to i386. This option allows one of the logical CPUs on a system to be used as a dedicated watchdog to cause a drop to the debugger and/or generate an NMI to the boot processor if the kernel ceases to respond. A sysctl enables the watchdog running out of the processor's idle thread; a callout is launched to reset a timer in the watchdog. If the callout fails to reset the timer for ten seconds, the watchdog will fire. The sysctl allows you to select which CPU will run the watchdog. A sample "debug.leak_schedlock" is included, which causes a sysctl to spin holding sched_lock in order to trigger the watchdog. On my Xeons, the watchdog is able to detect this failure mode and break into the debugger, which cannot otherwise be done without an NMI button. This option does not currently work with sched_ule due to ule's push notion of scheduling, similar to machdep.hlt_logical_cpus failing to work with that scheduler. On face value, this might seem somewhat inefficient, but there are a lot of dual-processor Xeons with HTT around, so using one as a watchdog for testing is not as inefficient as one might fear. Revision Changes Path 1.503 +1 -0 src/sys/conf/files.i386 1.213 +1 -0 src/sys/conf/options.i386 1.234 +9 -0 src/sys/i386/i386/mp_machdep.c 1.1 +225 -0 src/sys/i386/i386/mp_watchdog.c (new) 1.1 +34 -0 src/sys/i386/include/mp_watchdog.h (new)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040815153617.30898P-100000>