From owner-freebsd-stable@FreeBSD.ORG Wed Feb 20 19:36:42 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3824D6FA for ; Wed, 20 Feb 2013 19:36:42 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from elf.hq.norma.perm.ru (unknown [IPv6:2001:470:1f09:14c0::2]) by mx1.freebsd.org (Postfix) with ESMTP id CE722ECA for ; Wed, 20 Feb 2013 19:36:41 +0000 (UTC) Received: from [192.168.248.32] ([192.168.248.32]) by elf.hq.norma.perm.ru (8.14.5/8.14.5) with ESMTP id r1KJabGO099800 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 21 Feb 2013 01:36:38 +0600 (YEKT) (envelope-from emz@norma.perm.ru) Message-ID: <512525C1.1070502@norma.perm.ru> Date: Thu, 21 Feb 2013 01:36:33 +0600 From: "Eugene M. Zheganin" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: watchdogs Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (elf.hq.norma.perm.ru [192.168.3.10]); Thu, 21 Feb 2013 01:36:38 +0600 (YEKT) X-Spam-Status: No hits=-101.0 bayes=0.5 testhits ALL_TRUSTED=-1, USER_IN_WHITELIST=-100 autolearn=unavailable version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on elf.hq.norma.perm.ru X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 19:36:42 -0000 Hi. I have a bunch of FreeBSDs that hangs (and I really want to do something to fight this). May be it's the zfs or may be it's the pf (I also have a bunch of really stable ones, so it's hard to isolate and tell). Since 9.x hang more often I suppose it's pf. I use ichwd.ko and watchdogd to reboot a machine when it hangs. It works pretty well; I'm also working on a various WITNESS/INVARIANTS stuff and I'm trying to report it to gnats, but obviously it would be much nicer if the system would panic and leave some debuggable core after a hang (so far I don't have any, so I can only guess). I've read about software watchdog in kernel and I doesn'y quite understand: it's said that kernel software watchdog is able to panic when a deadlock occurs. Can this be achieved with ichwd ? Another one: as far as I understand ichwd reboots my machine on a hardware level, right ? So am I right saying that software watchdog can be, in theory, also deadlocked, thus, being kinda less reliable solution ? Thanks. Eugene.