From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 3 06:46:11 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A16C7106564A for ; Fri, 3 Apr 2009 06:46:11 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from redbull.bpaserver.net (redbullneu.bpaserver.net [213.198.78.217]) by mx1.freebsd.org (Postfix) with ESMTP id 4F69D8FC17 for ; Fri, 3 Apr 2009 06:46:11 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from outgoing.leidinger.net (pD9E2E542.dip.t-dialin.net [217.226.229.66]) by redbull.bpaserver.net (Postfix) with ESMTP id 8B7742E0A5; Fri, 3 Apr 2009 08:46:05 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 5A3BC10CD21; Fri, 3 Apr 2009 08:46:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1238741162; bh=w0QMJ+I+sjCdFgbJo/Js64SICcNLxeMkS rHGZRkFahA=; h=Message-ID:Date:From:To:Cc:Subject:References: In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=EKzCxlOk0msub8hQNyJIzhmLoWtF0wQY59rOG96sWgPt4xH9WuQOpbKHuNv3nhZ88 y3t3geirOoQwpPdM6uZ6DS0nevbeVcZ6G7O7zDovH5obL4GhzCZo339hCxI3WWLrtKY nSr1wCW8SuPQkqxTQBClVKeRAKer7ZgGsLSP3kyqCe849AE584nFLxmrMPtiVDJydPS gkyqxkIsJ5W8FfqKC86f4QzcMRt23MJDXCctO55csxbfQtRkcj8H1eD4zdqtBEtw9W6 e8F38TuM+KdPcahQXX4LE893tPm1781iKUH74jZigztNqaqKPL+QaMxBnOE6AXtw0LV +Ovr5Utxg== Received: (from www@localhost) by webmail.leidinger.net (8.14.3/8.13.8/Submit) id n336k1NJ070099; Fri, 3 Apr 2009 08:46:01 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Fri, 03 Apr 2009 08:46:01 +0200 Message-ID: <20090403084601.108111xg6o3b49ms@webmail.leidinger.net> X-Priority: 3 (Normal) Date: Fri, 03 Apr 2009 08:46:01 +0200 From: Alexander Leidinger To: Doug Ambrisko References: <200904022316.n32NGYWK015340@ambrisko.com> In-Reply-To: <200904022316.n32NGYWK015340@ambrisko.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.3) / FreeBSD-8.0 X-BPAnet-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: 8B7742E0A5.EF26F X-BPAnet-MailScanner: Found to be clean X-BPAnet-MailScanner-SpamCheck: not spam, ORDB-RBL, SpamAssassin (not cached, score=-14.223, required 6, BAYES_00 -15.00, DKIM_SIGNED 0.00, DKIM_VERIFIED -0.00, J_CHICKENPOX_22 0.60, RDNS_DYNAMIC 0.10, TW_ZF 0.08) X-BPAnet-MailScanner-From: alexander@leidinger.net X-Spam-Status: No X-Mailman-Approved-At: Fri, 03 Apr 2009 11:20:56 +0000 Cc: freebsd-hackers@freebsd.org, Andriy Gapon Subject: Re: watchdog: hw+sw? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Apr 2009 06:46:12 -0000 Quoting Doug Ambrisko (from Thu, 2 Apr 2009 =20 16:16:34 -0700 (PDT)): > This worked well for us so I think it is a good idea. Also some HW > watchdogs can be told to generate an NMI which can also produce a kernel > dump/ddb prompt. I've also implemented some rough code to put an > simplified back-trace into the IPMI event log in-case a disk or disk > I/O sub-system died. Somewhat related... I have 2 32bit systems with zfs which lock up =20 after a while. The lockup is strictly related to the disks. I can =20 still ping the system just fine, and the HW watchdog seems to still =20 work as intended (or it does not work at all anymore, as there's not =20 automatic reset), but as soon as I want to do something which involves =20 disks (access a webpage located on the zfs disks), I'm lost. The only =20 way to get some useful work done again is to reset manually. Your =20 paragraph above implies that the WD notices that there's a problem =20 with disks. While I know how to teach our watchdogd how to detect this (-e =20 option), we do not have support for this in the basesystem yet. Do you =20 have a patch for /etc/rc.d/watchdogd which allows to specify commands =20 to run via rc.conf or some patch which tells watchdogd to check a file? Bye, Alexander. --=20 Whatever you want to do, you have to do something else first. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID =3D B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID =3D 72077137