From owner-freebsd-stable@FreeBSD.ORG Fri Apr 6 17:27:15 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AD000106566C; Fri, 6 Apr 2012 17:27:15 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id A8D1F8FC16; Fri, 6 Apr 2012 17:27:14 +0000 (UTC) Received: by wgbds12 with SMTP id ds12so2356743wgb.31 for ; Fri, 06 Apr 2012 10:27:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=mGd+H0YuOPYDvcyBZXm1HUlKyvFWf4DabLoYAsf2MmI=; b=Yw1bRoPcxNEkfi227OlqBgA7nVdpkeTlKQGq3mhL/8FD6fIObr8ehNDI4wa6I3wrQ8 7blX0D9A1c7t0qJfQpL+dEz4NHAqdtqmi9z3L1tx5lukjx7oEmV9trY0/wtXVvP66Zi9 4YQLGyvyJQz/dSZd5DIjmtOAispf1jO7oRD2toQoUKDhPL8emAgyafqL3OOOKrdryVE0 phKX7OnHyVZwG/HMd806l1BXRtdRO2yVftjGaiZpp9uz0X7cX+JJCdkk4r/e0O7XSY6i +uVKXpkYsDfdihAeNw3U0lk/SnxgBfCJ+VYWlApbKZq97oOXSKHw49xUZHcoDE1PKQqQ N2Rw== Received: by 10.216.136.157 with SMTP id w29mr4407578wei.23.1333733233839; Fri, 06 Apr 2012 10:27:13 -0700 (PDT) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id l5sm8142205wia.11.2012.04.06.10.27.12 (version=SSLv3 cipher=OTHER); Fri, 06 Apr 2012 10:27:13 -0700 (PDT) Sender: Alexander Motin Message-ID: <4F7F276F.6080409@FreeBSD.org> Date: Fri, 06 Apr 2012 20:27:11 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.2) Gecko/20120226 Thunderbird/10.0.2 MIME-Version: 1.0 To: Doug Ambrisko References: <201204061712.q36HCKJP033408@ambrisko.com> In-Reply-To: <201204061712.q36HCKJP033408@ambrisko.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: scottl@FreeBSD.org, sbruno@FreeBSD.org, freebsd-stable@FreeBSD.org, John Baldwin Subject: Re: [stable-ish 9] Dell R815 ipmi(4) attach failure X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2012 17:27:15 -0000 On 04/06/12 20:12, Doug Ambrisko wrote: > Alexander Motin writes: > [ Charset ISO-8859-1 unsupported, converting... ] > | On 04/04/12 21:47, John Baldwin wrote: > |> On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote: > |>> John Baldwin writes: > |>> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote: > |>> |> John Baldwin writes: > |>> |> | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote: > |>> |> |> Doug Ambrisko writes: > |>> |> |> | John Baldwin writes: > |>> |> |> | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: > |>> |> |> | |> Sean Bruno writes: > |>> |> |> | |> | Noting a failure to attach to the onboard IPMI controller > |> with > |>> | this > |>> |> | dell > |>> |> |> | |> | R815. Not sure what to start poking at and thought I'd > |> though > |>> | this > |>> |> | over > |>> |> |> | |> | here for comment. > |>> |> |> | |> | > |>> |> |> | |> | -bash-4.2$ dmesg |grep ipmi > |>> |> |> | |> | ipmi0: KCS mode found at io 0xca8 on acpi > |>> |> |> | |> | ipmi1: on isa0 > |>> |> |> | |> | device_attach: ipmi1 attach returned 16 > |>> |> |> | |> | ipmi1: on isa0 > |>> |> |> | |> | device_attach: ipmi1 attach returned 16 > |>> |> |> | |> | ipmi0: Timed out waiting for GET_DEVICE_ID > |>> |> |> | |> > |>> |> |> | |> I've run into this recently. A quick hack to fix it is: > |>> |> |> | |> > |>> |> |> | |> Index: ipmi.c > |>> |> |> | |> > [snip] > |>> | If you use "-ct" then you get a file you can feed into schedgraph. > |>> | However, just reading the log, it seems that IRQ 20 keeps preempting > |>> | the KCS worker thread preventing it from getting anything done. Also, > |>> | there seem to be a lot of threads on CPU 0's runqueue waiting for a > |>> | chance to run (load average of 12 or 13 the entire time). You can try > |>> | just bumping up the max timeout from 3 seconds to higher perhaps. Not > |>> | sure why IRQ 20 keeps firing though. It might be related to USB, so > |>> | you could try fiddling with USB options in the BIOS perhaps, or disabling > |>> | the USB drivers to see if that fixes IPMI. > |>> > |>> Tried without USB in kernel: > |>> http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt > |> > |> Hmm, it's still just running constantly (note that the idle thread is > |> _never_ scheduled). The lion's share of the time seems to be spent in > |> "xpt_thrd". Note that there are several places where nothing happens except > |> that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks. I > |> would maybe start debugging that to see what in the world it is doing. Maybe > |> it is polling some hardware down in xpt_action() (i.e., xpt_action() for a > |> single bus called down into a driver and it is just spinning using polling > |> instead of sleeping and waiting for an interrupt). > | > | "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus > | on attach and by controller driver on hot-plug events. For some > | controllers it may be quite CPU-hungry. For example, for legacy ATA > | controllers, where bus reset may take many seconds of hardware polling, > | while devices just spinning up. For ahci(4) it was improved about year > | ago to not use polling when possible, but it still may loop for some > | time if controller is not responding on reset. What mfi(4), mentioned in > | log, does during scanning, I am not sure. > > I thought that mfi(4) could be an issue. There are some ata controllers > with nothing attached. I built a GENERIC with USB and mfi commented out > and then the timeout issue went away: > ipmi0: KCS mode found at io 0xca8 on acpi > ipmi1: on isa0 > device_attach: ipmi1 attach returned 16 > ipmi1: on isa0 > device_attach: ipmi1 attach returned 16 > ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1 > ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211 > ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272 > ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332 > ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0 > > Without mfi and with USB and it had issues: > ipmi0: KCS mode found at io 0xca8 on acpi > ipmi1: on isa0 > device_attach: ipmi1 attach returned 16 > ipmi1: on isa0 > device_attach: ipmi1 attach returned 16 > ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2 > ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137 > ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199 > ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259 > ipmi0: Timed out waiting for GET_DEVICE_ID > ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0 > > I can post more ktrdump traces if needed. A 1U Dell machine without > mfi also has this problem. As John mentioned it might be good to > bump up the timeout from 3s to 6s. I did that with the USB no mfi > kernel and that passed: > > % dmesg | grep ipmi > ipmi0: KCS mode found at io 0xca8 on acpi > ipmi1: on isa0 > device_attach: ipmi1 attach returned 16 > ipmi1: on isa0 > device_attach: ipmi1 attach returned 16 > ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2 > ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137 > ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199 > ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259 > ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0 > > So maybe we need to agressively bump up the timeout. I put a > timeout since I didn't want the system to hang. Anyone have a > good idea of a timeout. I thought I tried 6s initially and it > had issues but then the machine I was playing with had 3 mfi > cards and various disks hanging off it. I have no idea about IPMI timeout to propose, but can't that check be remade opposite: if response received -- use it, otherwise -- check error value? Obviously it is not IPMI problem that CPU is busy, but ability to work in those conditions would be a bonus. -- Alexander Motin