Date: Fri, 12 Aug 2016 17:29:58 -0700 From: John Baldwin <jhb@freebsd.org> To: David Wolfskill <david@catwhisker.org> Cc: hackers@freebsd.org Subject: Re: "ipmi0: KCS..." whines Message-ID: <6661021.NZidrlQVOE@ralph.baldwin.cx> In-Reply-To: <20160812214340.GZ1112@albert.catwhisker.org> References: <20160811175409.GW1112@albert.catwhisker.org> <2855524.PakqtZoDR6@ralph.baldwin.cx> <20160812214340.GZ1112@albert.catwhisker.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, August 12, 2016 02:43:40 PM David Wolfskill wrote: > On Fri, Aug 12, 2016 at 11:54:38AM -0700, John Baldwin wrote: > > ... > > So the issue is probably that the BMC controller on your box is sometimes > > slow in responding. The completion code is the third byte of the reply > > we wait to read after sending a request to the BMC via KCS. However, the > > first two bytes just echo back the request ID and command we asked for, so > > it may be that the BMC echoes those back right away without waiting for > > whatever work it needs to do to handle the request to complete, but doesn't > > send the completion code (the status of the request) until the request is > > fully processed. > > > > The driver is complaining that the BMC didn't respond with the completion > > code before it's timeout expired. The default timeout is MAX_TIMEOUT in > > sys/dev/ipmi/ipmivars.h which corresponds to 6 seconds. It may be that > > occasionally some "background" task runs in the BMC OS that delays responses > > to handling commands. It could also be that whatever work the BMC has to do > > to read this specific value is actually timing out or having issues in the > > hardware, etc. > > I could easily modify the stress-test loop to run "date" after each > "ipmitool" invocation. (Pity we don't seem to have a sub-second format > in strftime().) > > So... I tried the above (interspersing "date" commands while running > "ipmitool dcmi power reading" in a loop within script(1)). I did not > get a whine at 32 repetitions; I got one at 64. > > The total elapsed time was no more than 3 seconds (last timestamp - > first timestamp difference was 2 seconds). Hmm, you might see what 'MAX_TIMEOUT' is in sys/dev/ipmi/ipmivars.h in your tree. It might also be worthwhile wrapping it in ()'s as in HEAD it is just a bare '6 * hz'. The code to wait for IBF doesn't look like it would break without the ()'s though. It was bumped from 3 seconds to 6 seconds back in 10-current in r253812, but perhaps your box has 3 seconds instead of 6? -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6661021.NZidrlQVOE>