Date: Tue, 3 Apr 2012 09:37:50 -0700 (PDT) From: Doug Ambrisko <ambrisko@ambrisko.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: [stable-ish 9] Dell R815 ipmi(4) attach failure Message-ID: <201204031637.q33GboNt040791@ambrisko.com> In-Reply-To: <201204030851.43785.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin writes: | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote: | > Doug Ambrisko writes: | > | John Baldwin writes: | > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: | > | | > Sean Bruno writes: | > | | > | Noting a failure to attach to the onboard IPMI controller with this | dell | > | | > | R815. Not sure what to start poking at and thought I'd though this | over | > | | > | here for comment. | > | | > | | > | | > | -bash-4.2$ dmesg |grep ipmi | > | | > | ipmi0: KCS mode found at io 0xca8 on acpi | > | | > | ipmi1: <IPMI System Interface> on isa0 | > | | > | device_attach: ipmi1 attach returned 16 | > | | > | ipmi1: <IPMI System Interface> on isa0 | > | | > | device_attach: ipmi1 attach returned 16 | > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID | > | | > | > | | > I've run into this recently. A quick hack to fix it is: | > | | > | > | | > Index: ipmi.c | > | | > =================================================================== | > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v | > | | > retrieving revision 1.14 | > | | > diff -u -p -r1.14 ipmi.c | > | | > --- ipmi.c 14 Apr 2011 07:14:22 -0000 1.14 | > | | > +++ ipmi.c 31 Mar 2012 19:18:35 -0000 | > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg) | > | | > if (error == EWOULDBLOCK) { | > | | > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); | > | | > ipmi_free_request(req); | > | | > - return; | > | | > } else if (error) { | > | | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); | > | | > ipmi_free_request(req); | > | | > | > | | > The issue is that the wakeup doesn't actually wake up the msleep | > | | > in ipmi_submit_driver_request. The error being reported is that | > | | > the msleep timed out. This doesn't seem to be critical problem | > | | > since after this things seemed to work work. I saw this on 9.X. | > | | > Haven't seen it on 8.2. Not sure about -current. | > | | > | > | | > It doesn't happen on all machines. | > | | | > | | Hmm, are you seeing the KCS thread manage the request but the wakeup() | is | > | | lost? | > | | > | It was a couple of weeks ago that I played with it. I put printf's | > | around the msleep and wakeup. I saw the wakeup called but the sleep | > | not get it. I can try the test again later today. Right now my main | > | work machine is recovering from a power outage. This was with 9.0 | > | when I first saw it. This issue seems to only happen at boot time. | > | If I kldload the module after the system is booted then it seems to work | > | okay. The KCS part was working fine and got the data okay from the | > | request. I haven't seen or heard any issues with 8.2. | > | > With -current I patched ipmi.c with: | > Index: ipmi.c | > =================================================================== | > --- ipmi.c (revision 233806) | > +++ ipmi.c (working copy) | > @@ -523,7 +523,11 @@ | > * waiter that we awaken. | > */ | > if (req->ir_owner == NULL) | > +{ | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup | %d\n",__FUNCTION__,__LINE__,ticks); | > wakeup(req); | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup | %d\n",__FUNCTION__,__LINE__,ticks); | > +} | > else { | > dev = req->ir_owner; | > TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, | ir_link); | > @@ -543,7 +547,11 @@ | > IPMI_LOCK(sc); | > error = sc->ipmi_enqueue_request(sc, req); | > if (error == 0) | > +{ | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep | %d\n",__FUNCTION__,__LINE__,ticks); | > error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo); | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep | %d\n",__FUNCTION__,__LINE__,ticks); | > +} | > if (error == 0) | > error = req->ir_error; | > IPMI_UNLOCK(sc); | > @@ -695,8 +703,11 @@ | > error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT); | > if (error == EWOULDBLOCK) { | > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); | > + printf("DJA\n"); | > +/* | > ipmi_free_request(req); | > return; | > +*/ | > } else if (error) { | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); | > ipmi_free_request(req); | > | > and get | > # dmesg | grep ipmi | > ipmi0: KCS mode found at io 0xca8 on acpi | > ipmi1: <IPMI System Interface> on isa0 | > device_attach: ipmi1 attach returned 16 | > ipmi1: <IPMI System Interface> on isa0 | > device_attach: ipmi1 attach returned 16 | > ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2 | > ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201 | > ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263 | > ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323 | | Actually, can you compile with: | | options KTR | options KTR_COMPILE=KTR_SCHED | options KTR_MASK=KTR_SCHED | | and then add a temporary hack to ipmi.c to set ktr_mask to 0 after | ipmi_submit_driver_request() returns in ipmi_startup()? You can | then use 'ktrdump -ct' after boot to capture a log of what the scheduler | did including if it timed out the sleep, etc. I think this would be | useful for figuring out what went wrong. It does seem that it timed | out after 3 seconds. Assuming I didn't mess up, the log should be at: http://people.freebsd.org/~ambrisko/ipmi_ktr_dump.txt again, I using ipmi(4) as module loaded via the loader. | Also, it doesn't seem clear if pehaps the IPMI worker thread was | stalled behind another thread during boot. The KTR traces would show | us that if so. | | I don't think the ipmi1 probe can cause the problem (it bails out right | away and shouldn't be touching any hardware state). Agreed, but computers like to prove me wrong. Thanks, Doug A.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201204031637.q33GboNt040791>