From owner-freebsd-stable@FreeBSD.ORG Wed Feb 20 19:41:23 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 32CDB9B8 for ; Wed, 20 Feb 2013 19:41:23 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 231A9F14 for ; Wed, 20 Feb 2013 19:41:23 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 996B01A3CDF; Wed, 20 Feb 2013 11:41:19 -0800 (PST) Message-ID: <512526DD.1080707@mu.org> Date: Wed, 20 Feb 2013 11:41:17 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: "Eugene M. Zheganin" Subject: Re: watchdogs References: <512525C1.1070502@norma.perm.ru> In-Reply-To: <512525C1.1070502@norma.perm.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 19:41:23 -0000 On 2/20/13 11:36 AM, Eugene M. Zheganin wrote: > Hi. > > I have a bunch of FreeBSDs that hangs (and I really want to do > something to fight this). May be it's the zfs or may be it's the pf (I > also have a bunch of really stable ones, so it's hard to isolate and > tell). Since 9.x hang more often I suppose it's pf. I use ichwd.ko and > watchdogd to reboot a machine when it hangs. It works pretty well; > I'm also working on a various WITNESS/INVARIANTS stuff and I'm trying > to report it to gnats, but obviously it would be much nicer if the > system would panic and leave some debuggable core after a hang (so far > I don't have any, so I can only guess). I've read about software > watchdog in kernel and I doesn'y quite understand: it's said that > kernel software watchdog is able to panic when a deadlock occurs. Can > this be achieved with ichwd ? Another one: as far as I understand > ichwd reboots my machine on a hardware level, right ? So am I right > saying that software watchdog can be, in theory, also deadlocked, > thus, being kinda less reliable solution ? > Yes all your assumptions are correct. There is an 'enhanced watchdog' branch that I am working on that offers a "pre-watchdog timeout panic". However since this is done via the software you may not get your pre-timeout panic and only have a reboot. Later revisions may include facilities for generating NMI to trigger panic/logs and the followed by a hard reset by external hardware. Perhaps ichwd offers ability to send NMI? Let me check sources. -Alfred