Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Apr 2012 09:37:50 -0700 (PDT)
From:      Doug Ambrisko <ambrisko@ambrisko.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
Message-ID:  <201204031637.q33GboNt040791@ambrisko.com>
In-Reply-To: <201204030851.43785.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin writes:
| On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
| > Doug Ambrisko writes:
| > | John Baldwin writes:
| > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
| > | | > Sean Bruno writes:
| > | | > | Noting a failure to attach to the onboard IPMI controller with this 
| dell
| > | | > | R815.  Not sure what to start poking at and thought I'd though this 
| over
| > | | > | here for comment.
| > | | > | 
| > | | > | -bash-4.2$ dmesg |grep ipmi
| > | | > | ipmi0: KCS mode found at io 0xca8 on acpi
| > | | > | ipmi1: <IPMI System Interface> on isa0
| > | | > | device_attach: ipmi1 attach returned 16
| > | | > | ipmi1: <IPMI System Interface> on isa0
| > | | > | device_attach: ipmi1 attach returned 16
| > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID
| > | | > 
| > | | > I've run into this recently.  A quick hack to fix it is:
| > | | > 
| > | | > Index: ipmi.c
| > | | > ===================================================================
| > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
| > | | > retrieving revision 1.14
| > | | > diff -u -p -r1.14 ipmi.c
| > | | > --- ipmi.c	14 Apr 2011 07:14:22 -0000	1.14
| > | | > +++ ipmi.c	31 Mar 2012 19:18:35 -0000
| > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
| > | | >  	if (error == EWOULDBLOCK) {
| > | | >  		device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
| > | | >  		ipmi_free_request(req);
| > | | > -		return;
| > | | >  	} else if (error) {
| > | | >  		device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
| > | | >  		ipmi_free_request(req);
| > | | > 
| > | | > The issue is that the wakeup doesn't actually wake up the msleep
| > | | > in ipmi_submit_driver_request.  The error being reported is that
| > | | > the msleep timed out.  This doesn't seem to be critical problem
| > | | > since after this things seemed to work work.  I saw this on 9.X.
| > | | > Haven't seen it on 8.2.  Not sure about -current.
| > | | > 
| > | | > It doesn't happen on all machines.
| > | | 
| > | | Hmm, are you seeing the KCS thread manage the request but the wakeup() 
| is 
| > | | lost?
| > | 
| > | It was a couple of weeks ago that I played with it.  I put printf's
| > | around the msleep and wakeup.  I saw the wakeup called but the sleep
| > | not get it.  I can try the test again later today.  Right now my main
| > | work machine is recovering from a power outage.  This was with 9.0 
| > | when I first saw it.  This issue seems to only happen at boot time.
| > | If I kldload the module after the system is booted then it seems to work 
| > | okay.  The KCS part was working fine and got the data okay from the
| > | request.  I haven't seen or heard any issues with 8.2.
| > 
| > With -current I patched ipmi.c with:
| > Index: ipmi.c
| > ===================================================================
| > --- ipmi.c      (revision 233806)
| > +++ ipmi.c      (working copy)
| > @@ -523,7 +523,11 @@
| >          * waiter that we awaken.
| >          */
| >         if (req->ir_owner == NULL)
| > +{
| > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup 
| %d\n",__FUNCTION__,__LINE__,ticks);
| >                 wakeup(req);
| > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup 
| %d\n",__FUNCTION__,__LINE__,ticks);
| > +}
| >         else {
| >                 dev = req->ir_owner;
| >                 TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, 
| ir_link);
| > @@ -543,7 +547,11 @@
| >         IPMI_LOCK(sc);
| >         error = sc->ipmi_enqueue_request(sc, req);
| >         if (error == 0)
| > +{
| > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep 
| %d\n",__FUNCTION__,__LINE__,ticks);
| >                 error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo);
| > +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep 
| %d\n",__FUNCTION__,__LINE__,ticks);
| > +}
| >         if (error == 0)
| >                 error = req->ir_error;
| >         IPMI_UNLOCK(sc);
| > @@ -695,8 +703,11 @@
| >         error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
| >         if (error == EWOULDBLOCK) {
| >                 device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
| > +               printf("DJA\n");
| > +/*
| >                 ipmi_free_request(req);
| >                 return;
| > +*/
| >         } else if (error) {
| >                 device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
| >                 ipmi_free_request(req);
| > 
| > and get
| >   # dmesg | grep ipmi
| >   ipmi0: KCS mode found at io 0xca8 on acpi
| >   ipmi1: <IPMI System Interface> on isa0
| >   device_attach: ipmi1 attach returned 16
| >   ipmi1: <IPMI System Interface> on isa0
| >   device_attach: ipmi1 attach returned 16
| >   ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
| >   ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201
| >   ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263
| >   ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323
| 
| Actually, can you compile with:
| 
| options  	KTR
| options  	KTR_COMPILE=KTR_SCHED
| options 	KTR_MASK=KTR_SCHED
| 
| and then add a temporary hack to ipmi.c to set ktr_mask to 0 after
| ipmi_submit_driver_request() returns in ipmi_startup()?  You can
| then use 'ktrdump -ct' after boot to capture a log of what the scheduler
| did including if it timed out the sleep, etc.  I think this would be
| useful for figuring out what went wrong.  It does seem that it timed
| out after 3 seconds.

Assuming I didn't mess up, the log should be at:
	http://people.freebsd.org/~ambrisko/ipmi_ktr_dump.txt
again, I using ipmi(4) as module loaded via the loader.
 
| Also, it doesn't seem clear if pehaps the IPMI worker thread was
| stalled behind another thread during boot.  The KTR traces would show
| us that if so.
| 
| I don't think the ipmi1 probe can cause the problem (it bails out right
| away and shouldn't be touching any hardware state).

Agreed, but computers like to prove me wrong.

Thanks,

Doug A.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201204031637.q33GboNt040791>