Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Apr 2012 10:12:20 -0700 (PDT)
From:      Doug Ambrisko <ambrisko@ambrisko.com>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org, sbruno@FreeBSD.org, scottl@FreeBSD.org, John Baldwin <jhb@FreeBSD.org>
Subject:   Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
Message-ID:  <201204061712.q36HCKJP033408@ambrisko.com>
In-Reply-To: <4F7EC2A8.3000001@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Alexander Motin writes:
[ Charset ISO-8859-1 unsupported, converting... ]
| On 04/04/12 21:47, John Baldwin wrote:
| > On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
| >> John Baldwin writes:
| >> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
| >> |>  John Baldwin writes:
| >> |>  | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
| >> |>  |>  Doug Ambrisko writes:
| >> |>  |>  | John Baldwin writes:
| >> |>  |>  | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
| >> |>  |>  | |>  Sean Bruno writes:
| >> |>  |>  | |>  | Noting a failure to attach to the onboard IPMI controller
| > with
| >> | this
| >> |>  | dell
| >> |>  |>  | |>  | R815.  Not sure what to start poking at and thought I'd
| > though
| >> | this
| >> |>  | over
| >> |>  |>  | |>  | here for comment.
| >> |>  |>  | |>  |
| >> |>  |>  | |>  | -bash-4.2$ dmesg |grep ipmi
| >> |>  |>  | |>  | ipmi0: KCS mode found at io 0xca8 on acpi
| >> |>  |>  | |>  | ipmi1:<IPMI System Interface>  on isa0
| >> |>  |>  | |>  | device_attach: ipmi1 attach returned 16
| >> |>  |>  | |>  | ipmi1:<IPMI System Interface>  on isa0
| >> |>  |>  | |>  | device_attach: ipmi1 attach returned 16
| >> |>  |>  | |>  | ipmi0: Timed out waiting for GET_DEVICE_ID
| >> |>  |>  | |>
| >> |>  |>  | |>  I've run into this recently.  A quick hack to fix it is:
| >> |>  |>  | |>
| >> |>  |>  | |>  Index: ipmi.c
| >> |>  |>  | |>
	[snip]
| >> | If you use "-ct" then you get a file you can feed into schedgraph.
| >> | However, just reading the log, it seems that IRQ 20 keeps preempting
| >> | the KCS worker thread preventing it from getting anything done.  Also,
| >> | there seem to be a lot of threads on CPU 0's runqueue waiting for a
| >> | chance to run (load average of 12 or 13 the entire time).  You can try
| >> | just bumping up the max timeout from 3 seconds to higher perhaps.  Not
| >> | sure why IRQ 20 keeps firing though.  It might be related to USB, so
| >> | you could try fiddling with USB options in the BIOS perhaps, or disabling
| >> | the USB drivers to see if that fixes IPMI.
| >>
| >> Tried without USB in kernel:
| >> 	http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
| >
| > Hmm, it's still just running constantly (note that the idle thread is
| > _never_ scheduled).  The lion's share of the time seems to be spent in
| > "xpt_thrd".  Note that there are several places where nothing happens except
| > that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks.  I
| > would maybe start debugging that to see what in the world it is doing.  Maybe
| > it is polling some hardware down in xpt_action() (i.e., xpt_action() for a
| > single bus called down into a driver and it is just spinning using polling
| > instead of sleeping and waiting for an interrupt).
| 
| "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus 
| on attach and by controller driver on hot-plug events. For some 
| controllers it may be quite CPU-hungry. For example, for legacy ATA 
| controllers, where bus reset may take many seconds of hardware polling, 
| while devices just spinning up. For ahci(4) it was improved about year 
| ago to not use polling when possible, but it still may loop for some 
| time if controller is not responding on reset. What mfi(4), mentioned in 
| log, does during scanning, I am not sure.

I thought that mfi(4) could be an issue.  There are some ata controllers
with nothing attached.  I built a GENERIC with USB and mfi commented out
and then the timeout issue went away:
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1: <IPMI System Interface> on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1: <IPMI System Interface> on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272
  ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332
  ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0

Without mfi and with USB and it had issues:
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1: <IPMI System Interface> on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1: <IPMI System Interface> on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199
  ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259
  ipmi0: Timed out waiting for GET_DEVICE_ID
  ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0

I can post more ktrdump traces if needed.  A 1U Dell machine without
mfi also has this problem.  As John mentioned it might be good to
bump up the timeout from 3s to 6s.  I did that with the USB no mfi
kernel and that passed:

  % dmesg | grep ipmi
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1: <IPMI System Interface> on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1: <IPMI System Interface> on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199
  ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259
  ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0

So maybe we need to agressively bump up the timeout.  I put a
timeout since I didn't want the system to hang.  Anyone have a
good idea of a timeout.  I thought I tried 6s initially and it
had issues but then the machine I was playing with had 3 mfi
cards and various disks hanging off it.

Thanks,

Doug A.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201204061712.q36HCKJP033408>