Date: Fri, 03 Apr 2009 08:46:01 +0200 From: Alexander Leidinger <Alexander@Leidinger.net> To: Doug Ambrisko <ambrisko@ambrisko.com> Cc: freebsd-hackers@freebsd.org, Andriy Gapon <avg@icyb.net.ua> Subject: Re: watchdog: hw+sw? Message-ID: <20090403084601.108111xg6o3b49ms@webmail.leidinger.net> In-Reply-To: <200904022316.n32NGYWK015340@ambrisko.com> References: <200904022316.n32NGYWK015340@ambrisko.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Doug Ambrisko <ambrisko@ambrisko.com> (from Thu, 2 Apr 2009 =20 16:16:34 -0700 (PDT)): > This worked well for us so I think it is a good idea. Also some HW > watchdogs can be told to generate an NMI which can also produce a kernel > dump/ddb prompt. I've also implemented some rough code to put an > simplified back-trace into the IPMI event log in-case a disk or disk > I/O sub-system died. Somewhat related... I have 2 32bit systems with zfs which lock up =20 after a while. The lockup is strictly related to the disks. I can =20 still ping the system just fine, and the HW watchdog seems to still =20 work as intended (or it does not work at all anymore, as there's not =20 automatic reset), but as soon as I want to do something which involves =20 disks (access a webpage located on the zfs disks), I'm lost. The only =20 way to get some useful work done again is to reset manually. Your =20 paragraph above implies that the WD notices that there's a problem =20 with disks. While I know how to teach our watchdogd how to detect this (-e =20 option), we do not have support for this in the basesystem yet. Do you =20 have a patch for /etc/rc.d/watchdogd which allows to specify commands =20 to run via rc.conf or some patch which tells watchdogd to check a file? Bye, Alexander. --=20 Whatever you want to do, you have to do something else first. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID =3D B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID =3D 72077137
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090403084601.108111xg6o3b49ms>