From nobody Wed Sep 15 15:23:03 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 3C84417D7B59 for ; Wed, 15 Sep 2021 15:23:10 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smarthost1.sentex.ca", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4H8kW16Sgrz4mb0; Wed, 15 Sep 2021 15:23:09 +0000 (UTC) (envelope-from mike@sentex.net) Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.16.1/8.16.1) with ESMTPS id 18FFN2Nw053074 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 15 Sep 2021 11:23:02 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPv6:2607:f3e0:0:4:a977:f4b9:3629:f7dd] ([IPv6:2607:f3e0:0:4:a977:f4b9:3629:f7dd]) by pyroxene2a.sentex.ca (8.16.1/8.15.2) with ESMTPS id 18FFN2AW036090 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Wed, 15 Sep 2021 11:23:02 -0400 (EDT) (envelope-from mike@sentex.net) To: Alexander Motin , FreeBSD-STABLE Mailing List References: <51e58d7a-1b33-f7e3-13aa-3be5fee5c826@FreeBSD.org> <2c1ad9db-b92a-d0a8-b2ae-b3ffafc04604@sentex.net> <1f0d32e1-e907-080f-1669-65148c245dd8@sentex.net> From: mike tancsa Subject: Re: ipmi0: Watchdog set returned 0xc0 (releng_13) Message-ID: Date: Wed, 15 Sep 2021 11:23:03 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-Rspamd-Queue-Id: 4H8kW16Sgrz4mb0 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On 9/14/2021 9:29 PM, Alexander Motin wrote: > Hi Mike, > > Could you try my 6c2d4404161a commit? I don't know about your case, bu= t > it fixes 0xcc error I see on my systems for timeouts below 120 seconds.= Hi Alexander, This is on the Supermicro X11SCH-F.=C2=A0 BMC firmware was version 1.73 (latest version on the website) ipmi0: port 0xca2,0xca3 on acpi0 ipmi0: KCS mode found at io 0xca2 on acpi ipmi0: IPMI device rev. 1, firmware rev. 1.73, version 2.0, device support mask 0xbf ipmi0: Number of channels 2 ipmi0: Attached watchdog ipmi0: Establishing power cycle handler Its no longer printing the error!=C2=A0 If I start up watchdogd -t 30 and then do a killall -9 watchdogd, it does a graceful shutdown of the box !?!=C2=A0 Thats very cool. Even be= tter than before as a hard reset. But I guess will it do a hard reset if the box is actually live locked ?=C2=A0 I did a quick test to confirm, that i= t does indeed not wait around too=C2=A0 long.=C2=A0 I added an infinite loo= p in /usr/local/etc/rc.d/stop-shutdown.sh and it only fired for 6 seconds before the box hard reset its logged in the BMC log too. # ipmitool sel list =C2=A0=C2=A0 1 | 09/15/2021 | 14:42:04 | Watchdog2 #0xca | Timer interrup= t () | Asserted =C2=A0=C2=A0 2 | 09/15/2021 | 14:42:22 | Watchdog2 #0xca | Power cycle ()= | Asserted I also tried on a X11SSL-F ipmi0: IPMI device rev. 1, firmware rev. 1.60, version 2.0, device support mask 0xbf ipmi0: Number of channels 2 ipmi0: Attached watchdog ipmi0: Establishing power cycle handler =C2=A0# ipmitool sel list | tail -3 =C2=A0=C2=A0 6 | 08/20/2021 | 20:45:38 | Fan #0x45 | Lower Non-recoverabl= e going low=C2=A0 | Asserted =C2=A0=C2=A0 7 | 09/15/2021 | 11:15:28 | Watchdog2 #0xca | Timer interrup= t () | Asserted =C2=A0=C2=A0 8 | 09/15/2021 | 11:15:38 | Watchdog2 #0xca | Power cycle ()= | Asserted # I have a RELENG_12 box in production I will try as well later, but so far so good.=C2=A0 Thanks for fixing! =C2=A0=C2=A0=C2=A0 ---Mike =C2=A0=C2=A0=C2=A0 ---Mike