Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Aug 2016 17:29:58 -0700
From:      John Baldwin <jhb@freebsd.org>
To:        David Wolfskill <david@catwhisker.org>
Cc:        hackers@freebsd.org
Subject:   Re: "ipmi0: KCS..." whines
Message-ID:  <6661021.NZidrlQVOE@ralph.baldwin.cx>
In-Reply-To: <20160812214340.GZ1112@albert.catwhisker.org>
References:  <20160811175409.GW1112@albert.catwhisker.org> <2855524.PakqtZoDR6@ralph.baldwin.cx> <20160812214340.GZ1112@albert.catwhisker.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, August 12, 2016 02:43:40 PM David Wolfskill wrote:
> On Fri, Aug 12, 2016 at 11:54:38AM -0700, John Baldwin wrote:
> > ...
> > So the issue is probably that the BMC controller on your box is sometimes
> > slow in responding.  The completion code is the third byte of the reply
> > we wait to read after sending a request to the BMC via KCS.  However, the
> > first two bytes just echo back the request ID and command we asked for, so
> > it may be that the BMC echoes those back right away without waiting for
> > whatever work it needs to do to handle the request to complete, but doesn't
> > send the completion code (the status of the request) until the request is
> > fully processed.
> > 
> > The driver is complaining that the BMC didn't respond with the completion
> > code before it's timeout expired.  The default timeout is MAX_TIMEOUT in
> > sys/dev/ipmi/ipmivars.h which corresponds to 6 seconds.  It may be that
> > occasionally some "background" task runs in the BMC OS that delays responses
> > to handling commands.  It could also be that whatever work the BMC has to do
> > to read this specific value is actually timing out or having issues in the
> > hardware, etc.
> 
> I could easily modify the stress-test loop to run "date" after each
> "ipmitool" invocation.  (Pity we don't seem to have a sub-second format
> in strftime().)
> 
> So... I tried the above (interspersing "date" commands while running
> "ipmitool dcmi power reading" in a loop within script(1)).  I did not
> get a whine at 32 repetitions; I got one at 64.
> 
> The total elapsed time was no more than 3 seconds (last timestamp -
> first timestamp difference was 2 seconds).

Hmm, you might see what 'MAX_TIMEOUT' is in sys/dev/ipmi/ipmivars.h in your
tree.  It might also be worthwhile wrapping it in ()'s as in HEAD it is just a
bare '6 * hz'.  The code to wait for IBF doesn't look like it would break 
without the ()'s though.

It was bumped from 3 seconds to 6 seconds back in 10-current in r253812, but
perhaps your box has 3 seconds instead of 6?

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6661021.NZidrlQVOE>