From owner-freebsd-stable@FreeBSD.ORG Fri Jul 27 14:55:24 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F020F106566B; Fri, 27 Jul 2012 14:55:23 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from mail.averesystems.com (mail.averesystems.com [208.70.68.85]) by mx1.freebsd.org (Postfix) with ESMTP id BFDCF8FC18; Fri, 27 Jul 2012 14:55:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.averesystems.com (Postfix) with ESMTP id 3804C480968; Fri, 27 Jul 2012 10:55:26 -0400 (EDT) X-Virus-Scanned: amavisd-new at mail.averesystems.com Received: from mail.averesystems.com ([127.0.0.1]) by localhost (mail.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9nSzM6a7LyDJ; Fri, 27 Jul 2012 10:55:25 -0400 (EDT) Received: from riven.arriad.com (206.193.225.214.nauticom.net [206.193.225.214]) by mail.averesystems.com (Postfix) with ESMTPSA id 54BA0480958; Fri, 27 Jul 2012 10:55:25 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=us-ascii From: Andrew Boyer In-Reply-To: Date: Fri, 27 Jul 2012 10:55:21 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <4ECC422A-F7A8-4F6C-9E9D-01080927C36D@averesystems.com> References: <1343350238.12294.10.camel@powernoodle.corp.yahoo.com> <23294764-F30B-4732-8C41-3F0ECA5F273C@averesystems.com> To: attilio@FreeBSD.org X-Mailer: Apple Mail (2.1278) Cc: FreeBSD Stable Mailing List , John Baldwin , Andriy Gapon Subject: Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jul 2012 14:55:24 -0000 On Jul 27, 2012, at 10:42 AM, Attilio Rao wrote: > On Fri, Jul 27, 2012 at 3:33 PM, Andrew Boyer = wrote: >>=20 >> On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote: >>=20 >>> For the time being I had to revert the following from my stable/9 = tree. >>> Otherwise I would get a kernel panic on shutdown from ipmi(4). >>>=20 >>> http://svnweb.freebsd.org/base?view=3Drevision&revision=3D237839 >>> http://svnweb.freebsd.org/base?view=3Drevision&revision=3D221121 >>>=20 >>=20 >> On a somewhat related note: We noticed recently that you can't pet or = disable the IPMI hardware watchdog once SCHEDULER_STOPPED() is true. = This means it can fire unexpectedly while you're dumping core or = rebooting, depending on how long the timeout was on the pet before the = panic. The ipmi driver will need to process the command differently if = the scheduler is stopped. I haven't had time to look at a fix yet. >=20 > I recall I fixed that internally for SV, but the key here is that we > need to find an unified (or a default policy). > More specifically, do we want the watchdog also covers the kernel dump > part (because of possible deadlocks when dumping). If the answer is > yes, we likely need pat the watchdog from within the dumping cycle > itself. If the answer is no, then we can just disable it when entering > the panic path. But anyway, we need to identify a default policy that > makes sense first. >=20 > Attilio >=20 For our use case, we need the system to reset if the dump hangs. As the code stands now, you can't disable the HW watchdog from the panic = path. Prior to stopping the scheduler early in panic(), you don't know = the lock state, so you can't safely initiate the IPMI command. (It hung = the first time I tried it.) After stopping the scheduler, you can't pet = it to turn it off. -Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com