From owner-freebsd-hackers Thu Sep 12 9:20:43 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4742A37B405 for ; Thu, 12 Sep 2002 09:20:31 -0700 (PDT) Received: from rwcrmhc51.attbi.com (rwcrmhc51.attbi.com [204.127.198.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9771D43E4A for ; Thu, 12 Sep 2002 09:20:30 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc51.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020912162030.RFZA10266.rwcrmhc51.attbi.com@InterJet.elischer.org>; Thu, 12 Sep 2002 16:20:30 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id JAA73987; Thu, 12 Sep 2002 09:17:44 -0700 (PDT) Date: Thu, 12 Sep 2002 09:17:42 -0700 (PDT) From: Julian Elischer To: Doug Ambrisko Cc: Bruce M Simpson , hackers@FreeBSD.org Subject: Re: Supporting HW_WDOG? In-Reply-To: <200209121603.g8CG3Dn43523@ambrisko.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG We added this option to allow a USERLAND hardware watchdog tickler to kick the dog every so often, but discovered that we couldn't get core-dumps. the idea is that we want to not only see that the kernel is responding but we wanted the machine to reboot if the userland wass so screwed up that the dog-kicker couldn't run. the kernel support was to ensure that long in-kernel ops such as core-dumps could complete. On Thu, 12 Sep 2002, Doug Ambrisko wrote: > Bruce M Simpson writes: > | Mainly interested in exploring this with a view to implementing capabilities > | a bit like the LOM chip found in Sun Netras on an i386 box. > | > | What I'd like to do is modify a Soekris net4501 for this. The Sun LOM > | chip handles things like serial console capabilities, environmentals, > | and provides a means of executing memory tests, etc; it's able to issue > | notifications to the OS running on the machine, I believe using traps or > | an NMI mechanism. > | > | Come to think of it, has anybody seen anything like this in the Intel > | IPMI specification? Just a thought. > > We have implemented a SW & HW watchdog here. The HW support is based on > the reboot timers in the Intel ICH and SIS 630 chips. Our scheme is to > implement a SW watchdog in hard clock that is controlled via a sysctl. > Then we enforce the SW watchdog via the HW watchdog (ie if the SW watch > dog doesn't reset the HW watchdog then the machine reboots). This gives > us more flexibility then the HW watchdog does since they have a limited > and non-standard amount of time they can wait for. This way only > sysctl's are used an no /dev entries are needed. I did add a kernel > sysctl function so that I could call another sysctl easily. This let > me "dynamically" link in the HW watchdog if a kldload was loaded that > implemented the HW watchdog sysctl. I kldunload would disable the > HW watchdog and unlink it from the SW one. We also added code that if the > machine panic'ed or dropped into the debugger with DDB not set as unattended > then the watchdogs would get turned off (also we disable consmute at the > same time). We have this working on generic PC motherboards with no > custom hardware. > > The user-land tickler just does a syctl to set the ticks that SW watchdog > should wait for before it calls panic and gives you a kernel core. > > Unfortunately the ICH reboot timer cannot generate an NMI. That would have > been better so we could get a core. > > Note at a prior company we needed the tickler since the HW watchdog could > not be deactivated until it went off :-( Smarter watchdogs can be turned > off and then ticklers are not needed. > > If anyone it interested with playing with the code and getting it into > shape to put into -current I can send it to you. It needs to be cleaned > up. The kernel sysctl function needs to be added properly to the > sysctl file etc. I can help with testing, reviews and questions. I just > don't have time for polishing right now. > > Doug A. > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message