From owner-freebsd-stable@FreeBSD.ORG Fri Apr 6 17:12:27 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85EC1106566B; Fri, 6 Apr 2012 17:12:27 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90]) by mx1.freebsd.org (Postfix) with ESMTP id 4CB4B8FC08; Fri, 6 Apr 2012 17:12:27 +0000 (UTC) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 06 Apr 2012 10:12:27 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id q36HCKn1033409; Fri, 6 Apr 2012 10:12:20 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.4/8.14.4/Submit) id q36HCKJP033408; Fri, 6 Apr 2012 10:12:20 -0700 (PDT) (envelope-from ambrisko) From: Doug Ambrisko Message-Id: <201204061712.q36HCKJP033408@ambrisko.com> In-Reply-To: <4F7EC2A8.3000001@FreeBSD.org> To: Alexander Motin Date: Fri, 6 Apr 2012 10:12:20 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL124d (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="US-ASCII" Cc: freebsd-stable@FreeBSD.org, sbruno@FreeBSD.org, scottl@FreeBSD.org, John Baldwin Subject: Re: [stable-ish 9] Dell R815 ipmi(4) attach failure X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2012 17:12:27 -0000 Alexander Motin writes: [ Charset ISO-8859-1 unsupported, converting... ] | On 04/04/12 21:47, John Baldwin wrote: | > On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote: | >> John Baldwin writes: | >> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote: | >> |> John Baldwin writes: | >> |> | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote: | >> |> |> Doug Ambrisko writes: | >> |> |> | John Baldwin writes: | >> |> |> | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: | >> |> |> | |> Sean Bruno writes: | >> |> |> | |> | Noting a failure to attach to the onboard IPMI controller | > with | >> | this | >> |> | dell | >> |> |> | |> | R815. Not sure what to start poking at and thought I'd | > though | >> | this | >> |> | over | >> |> |> | |> | here for comment. | >> |> |> | |> | | >> |> |> | |> | -bash-4.2$ dmesg |grep ipmi | >> |> |> | |> | ipmi0: KCS mode found at io 0xca8 on acpi | >> |> |> | |> | ipmi1: on isa0 | >> |> |> | |> | device_attach: ipmi1 attach returned 16 | >> |> |> | |> | ipmi1: on isa0 | >> |> |> | |> | device_attach: ipmi1 attach returned 16 | >> |> |> | |> | ipmi0: Timed out waiting for GET_DEVICE_ID | >> |> |> | |> | >> |> |> | |> I've run into this recently. A quick hack to fix it is: | >> |> |> | |> | >> |> |> | |> Index: ipmi.c | >> |> |> | |> [snip] | >> | If you use "-ct" then you get a file you can feed into schedgraph. | >> | However, just reading the log, it seems that IRQ 20 keeps preempting | >> | the KCS worker thread preventing it from getting anything done. Also, | >> | there seem to be a lot of threads on CPU 0's runqueue waiting for a | >> | chance to run (load average of 12 or 13 the entire time). You can try | >> | just bumping up the max timeout from 3 seconds to higher perhaps. Not | >> | sure why IRQ 20 keeps firing though. It might be related to USB, so | >> | you could try fiddling with USB options in the BIOS perhaps, or disabling | >> | the USB drivers to see if that fixes IPMI. | >> | >> Tried without USB in kernel: | >> http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt | > | > Hmm, it's still just running constantly (note that the idle thread is | > _never_ scheduled). The lion's share of the time seems to be spent in | > "xpt_thrd". Note that there are several places where nothing happens except | > that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks. I | > would maybe start debugging that to see what in the world it is doing. Maybe | > it is polling some hardware down in xpt_action() (i.e., xpt_action() for a | > single bus called down into a driver and it is just spinning using polling | > instead of sleeping and waiting for an interrupt). | | "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus | on attach and by controller driver on hot-plug events. For some | controllers it may be quite CPU-hungry. For example, for legacy ATA | controllers, where bus reset may take many seconds of hardware polling, | while devices just spinning up. For ahci(4) it was improved about year | ago to not use polling when possible, but it still may loop for some | time if controller is not responding on reset. What mfi(4), mentioned in | log, does during scanning, I am not sure. I thought that mfi(4) could be an issue. There are some ata controllers with nothing attached. I built a GENERIC with USB and mfi commented out and then the timeout issue went away: ipmi0: KCS mode found at io 0xca8 on acpi ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1 ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211 ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272 ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332 ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0 Without mfi and with USB and it had issues: ipmi0: KCS mode found at io 0xca8 on acpi ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2 ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137 ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199 ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259 ipmi0: Timed out waiting for GET_DEVICE_ID ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0 I can post more ktrdump traces if needed. A 1U Dell machine without mfi also has this problem. As John mentioned it might be good to bump up the timeout from 3s to 6s. I did that with the USB no mfi kernel and that passed: % dmesg | grep ipmi ipmi0: KCS mode found at io 0xca8 on acpi ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2 ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137 ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199 ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259 ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0 So maybe we need to agressively bump up the timeout. I put a timeout since I didn't want the system to hang. Anyone have a good idea of a timeout. I thought I tried 6s initially and it had issues but then the machine I was playing with had 3 mfi cards and various disks hanging off it. Thanks, Doug A.