Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Apr 2012 14:47:49 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Doug Ambrisko <ambrisko@ambrisko.com>
Cc:        scottl@freebsd.org, freebsd-stable@freebsd.org, Alexander Motin <mav@freebsd.org>
Subject:   Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
Message-ID:  <201204041447.49398.jhb@freebsd.org>
In-Reply-To: <201204041624.q34GOXvc063506@ambrisko.com>
References:  <201204041624.q34GOXvc063506@ambrisko.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
> John Baldwin writes:
> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
> | > John Baldwin writes:
> | > | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
> | > | > Doug Ambrisko writes:
> | > | > | John Baldwin writes:
> | > | > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
> | > | > | | > Sean Bruno writes:
> | > | > | | > | Noting a failure to attach to the onboard IPMI controller 
with 
> | this 
> | > | dell
> | > | > | | > | R815.  Not sure what to start poking at and thought I'd 
though 
> | this 
> | > | over
> | > | > | | > | here for comment.
> | > | > | | > | 
> | > | > | | > | -bash-4.2$ dmesg |grep ipmi
> | > | > | | > | ipmi0: KCS mode found at io 0xca8 on acpi
> | > | > | | > | ipmi1: <IPMI System Interface> on isa0
> | > | > | | > | device_attach: ipmi1 attach returned 16
> | > | > | | > | ipmi1: <IPMI System Interface> on isa0
> | > | > | | > | device_attach: ipmi1 attach returned 16
> | > | > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID
> | > | > | | > 
> | > | > | | > I've run into this recently.  A quick hack to fix it is:
> | > | > | | > 
> | > | > | | > Index: ipmi.c
> | > | > | | > 
> | ===================================================================
> | > | > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
> | > | > | | > retrieving revision 1.14
> | > | > | | > diff -u -p -r1.14 ipmi.c
> | > | > | | > --- ipmi.c	14 Apr 2011 07:14:22 -0000	1.14
> | > | > | | > +++ ipmi.c	31 Mar 2012 19:18:35 -0000
> | > | > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
> | > | > | | >  	if (error == EWOULDBLOCK) {
> | > | > | | >  		device_printf(dev, "Timed out waiting for 
> | GET_DEVICE_ID\n");
> | > | > | | >  		ipmi_free_request(req);
> | > | > | | > -		return;
> | > | > | | >  	} else if (error) {
> | > | > | | >  		device_printf(dev, "Failed GET_DEVICE_ID: %d\n", 
error);
> | > | > | | >  		ipmi_free_request(req);
> | > | > | | > 
> | > | > | | > The issue is that the wakeup doesn't actually wake up the 
msleep
> | > | > | | > in ipmi_submit_driver_request.  The error being reported is 
that
> | > | > | | > the msleep timed out.  This doesn't seem to be critical 
problem
> | > | > | | > since after this things seemed to work work.  I saw this on 
9.X.
> | > | > | | > Haven't seen it on 8.2.  Not sure about -current.
> | > | > | | > 
> | > | > | | > It doesn't happen on all machines.
> | > | > | | 
> | > | > | | Hmm, are you seeing the KCS thread manage the request but the 
> | wakeup() 
> | > | is 
> | > | > | | lost?
> | > | > | 
> | > | > | It was a couple of weeks ago that I played with it.  I put 
printf's
> | > | > | around the msleep and wakeup.  I saw the wakeup called but the 
sleep
> | > | > | not get it.  I can try the test again later today.  Right now my 
main
> | > | > | work machine is recovering from a power outage.  This was with 9.0 
> | > | > | when I first saw it.  This issue seems to only happen at boot 
time.
> | > | > | If I kldload the module after the system is booted then it seems 
to 
> | work 
> | > | > | okay.  The KCS part was working fine and got the data okay from 
the
> | > | > | request.  I haven't seen or heard any issues with 8.2.
> | > | > 
> | > | > With -current I patched ipmi.c with:
> | > | > Index: ipmi.c
> | > | > ===================================================================
> | > | > --- ipmi.c      (revision 233806)
> | > | > +++ ipmi.c      (working copy)
> | > | > @@ -523,7 +523,11 @@
> | > | >          * waiter that we awaken.
> | > | >          */
> | > | >         if (req->ir_owner == NULL)
> | > | > +{
> | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup 
> | > | %d\n",__FUNCTION__,__LINE__,ticks);
> | > | >                 wakeup(req);
> | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup 
> | > | %d\n",__FUNCTION__,__LINE__,ticks);
> | > | > +}
> | > | >         else {
> | > | >                 dev = req->ir_owner;
> | > | >                 TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, 
req, 
> | > | ir_link);
> | > | > @@ -543,7 +547,11 @@
> | > | >         IPMI_LOCK(sc);
> | > | >         error = sc->ipmi_enqueue_request(sc, req);
> | > | >         if (error == 0)
> | > | > +{
> | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep 
> | > | %d\n",__FUNCTION__,__LINE__,ticks);
> | > | >                 error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", 
timo);
> | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep 
> | > | %d\n",__FUNCTION__,__LINE__,ticks);
> | > | > +}
> | > | >         if (error == 0)
> | > | >                 error = req->ir_error;
> | > | >         IPMI_UNLOCK(sc);
> | > | > @@ -695,8 +703,11 @@
> | > | >         error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
> | > | >         if (error == EWOULDBLOCK) {
> | > | >                 device_printf(dev, "Timed out waiting for 
> | GET_DEVICE_ID\n");
> | > | > +               printf("DJA\n");
> | > | > +/*
> | > | >                 ipmi_free_request(req);
> | > | >                 return;
> | > | > +*/
> | > | >         } else if (error) {
> | > | >                 device_printf(dev, "Failed GET_DEVICE_ID: %d\n", 
error);
> | > | >                 ipmi_free_request(req);
> | > | > 
> | > | > and get
> | > | >   # dmesg | grep ipmi
> | > | >   ipmi0: KCS mode found at io 0xca8 on acpi
> | > | >   ipmi1: <IPMI System Interface> on isa0
> | > | >   device_attach: ipmi1 attach returned 16
> | > | >   ipmi1: <IPMI System Interface> on isa0
> | > | >   device_attach: ipmi1 attach returned 16
> | > | >   ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
> | > | >   ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201
> | > | >   ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263
> | > | >   ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323
> | > | 
> | > | Actually, can you compile with:
> | > | 
> | > | options  	KTR
> | > | options  	KTR_COMPILE=KTR_SCHED
> | > | options 	KTR_MASK=KTR_SCHED
> | > | 
> | > | and then add a temporary hack to ipmi.c to set ktr_mask to 0 after
> | > | ipmi_submit_driver_request() returns in ipmi_startup()?  You can
> | > | then use 'ktrdump -ct' after boot to capture a log of what the 
scheduler
> | > | did including if it timed out the sleep, etc.  I think this would be
> | > | useful for figuring out what went wrong.  It does seem that it timed
> | > | out after 3 seconds.
> | > 
> | > Assuming I didn't mess up, the log should be at:
> | > 	http://people.freebsd.org/~ambrisko/ipmi_ktr_dump.txt
> | > again, I using ipmi(4) as module loaded via the loader.
> | 
> | If you use "-ct" then you get a file you can feed into schedgraph.  
> | However, just reading the log, it seems that IRQ 20 keeps preempting 
> | the KCS worker thread preventing it from getting anything done.  Also, 
> | there seem to be a lot of threads on CPU 0's runqueue waiting for a 
> | chance to run (load average of 12 or 13 the entire time).  You can try 
> | just bumping up the max timeout from 3 seconds to higher perhaps.  Not 
> | sure why IRQ 20 keeps firing though.  It might be related to USB, so 
> | you could try fiddling with USB options in the BIOS perhaps, or disabling
> | the USB drivers to see if that fixes IPMI.
> 
> Tried without USB in kernel:
> 	http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt

Hmm, it's still just running constantly (note that the idle thread is
_never_ scheduled).  The lion's share of the time seems to be spent in 
"xpt_thrd".  Note that there are several places where nothing happens except 
that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks.  I 
would maybe start debugging that to see what in the world it is doing.  Maybe 
it is polling some hardware down in xpt_action() (i.e., xpt_action() for a 
single bus called down into a driver and it is just spinning using polling 
instead of sleeping and waiting for an interrupt).

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201204041447.49398.jhb>