From owner-freebsd-stable@FreeBSD.ORG Tue Jul 31 19:23:35 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7BC4B106564A; Tue, 31 Jul 2012 19:23:35 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90]) by mx1.freebsd.org (Postfix) with ESMTP id 3A1558FC0C; Tue, 31 Jul 2012 19:23:35 +0000 (UTC) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 31 Jul 2012 12:24:38 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id q6VJNSo8035724; Tue, 31 Jul 2012 12:23:28 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.4/8.14.4/Submit) id q6VJNSM6035723; Tue, 31 Jul 2012 12:23:28 -0700 (PDT) (envelope-from ambrisko) Date: Tue, 31 Jul 2012 12:23:28 -0700 From: Doug Ambrisko To: Andriy Gapon Message-ID: <20120731192328.GA33557@internal.ambrisko.com> References: <1343350238.12294.10.camel@powernoodle.corp.yahoo.com> <23294764-F30B-4732-8C41-3F0ECA5F273C@averesystems.com> <5012F14F.7070204@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5012F14F.7070204@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: FreeBSD Stable Mailing List , John Baldwin , Andrew Boyer Subject: Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jul 2012 19:23:35 -0000 On Fri, Jul 27, 2012 at 10:51:43PM +0300, Andriy Gapon wrote: | on 27/07/2012 17:33 Andrew Boyer said the following: | > | > On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote: | > | >> For the time being I had to revert the following from my stable/9 tree. | >> Otherwise I would get a kernel panic on shutdown from ipmi(4). | >> | >> http://svnweb.freebsd.org/base?view=revision&revision=237839 | >> http://svnweb.freebsd.org/base?view=revision&revision=221121 | > | > On a somewhat related note: We noticed recently that you can't pet or disable | > the IPMI hardware watchdog once SCHEDULER_STOPPED() is true. This means it | > can fire unexpectedly while you're dumping core or rebooting, depending on | > how long the timeout was on the pet before the panic. The ipmi driver will | > need to process the command differently if the scheduler is stopped. I | > haven't had time to look at a fix yet. | | Yeah, I noticed that unlike most (all?) other watchdog drivers where watchdog | re-arming is a very basic operation like doing one I/O the IPMI watchdog does | some more complex stuff which involves waiting on another thread. I think that | this may be a little bit too much for a reliable watchdog driver. At least, as | you note, this definitely won't work for the panic case where only one thread is | left running. I guess that the driver should check for that case and do a | direct operation instead of enqueueing a request and waiting for another thread | to execute it. I have some local hacks, that allows KCS mode to run in a polled mode. We do that so we can put kernel back traces into the system event log. Julian had code in FreeBSD to "pat" a watchdog during a core dump. We have local code here to disable console muted when dropping into the kernel debugger and enable console muting when exited. It might be useful to tie this into the watchdog, disable it when in kernel debugger and resume it when exited. With my polling hack, I don't think I delt with the case if there was already a transaction in progress. SMIC could be done like KCS. SSIF could be harder since it uses the i2c interface to talk to the HW which is more complicated. Thanks, Doug A.