Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Oct 2012 14:43:02 -0700
From:      Doug Ambrisko <ambrisko@ambrisko.com>
To:        Anton Yuzhaninov <citrin@citrin.ru>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Problem with IPMI KCS driver
Message-ID:  <20121018214302.GA8009@ambrisko.com>
In-Reply-To: <507FCF9B.9080104@citrin.ru>
References:  <503DE2AB.6030702@citrin.ru> <201208290825.44198.jhb@freebsd.org> <506573DD.2030808@citrin.ru> <201209280848.35380.jhb@freebsd.org> <507FCF9B.9080104@citrin.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 18, 2012 at 01:44:59PM +0400, Anton Yuzhaninov wrote:
| On 28.09.2012 16:48, John Baldwin wrote:
| >>kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to
| >>>  /usr/src/sys/dev/ipmi/ipmi_kcs.c:94
| >>>
| >>>     91                 while (ticks - start<  MAX_TIMEOUT&&
| >>>     92                     !(status&  KCS_STATUS_OBF)) {
| >>>     93                         DELAY(100);
| >>>     94                         status = INB(sc, KCS_CTL_STS);
| >>>     95                 }
| >Hummm.  I'm a bit out of ideas then.  Even the volatile change is a bug 
| >that
| >could have been confirmed (to see if volatile was preventing the compiler
| >from caching the value of 'ticks') by examining the assembly.
| >
| >Well, maybe this.  This just avoids using 'ticks' altogether and depends on
| >DELAY(100) doing what it says:
| 
| New patch also don't solve my problem.
| 
| My guess was wrong. Loop in kcs_wait_for_obf() is not endless, at least 
| with last patch.
| Whole function called in some loop, but because loop in kcs_wait_for_obf() 
| takes much CPU time, backtrace always point to loop kcs_wait_for_obf().

Yep, the IPMI local interfaces are polled so they use a lot of CPU
so it pretty much always going to be checking "are you done yet"
once a command is submitted.  We have local patches here that changes
the DELAY into a tsleep when the system is running.  It has the bad
feature of making it a lot slower but uses far less CPU.  So for us
it is a good trade off.  One reason to put it into a loop is
so things happen in order and are not interrupted.  I guess a different
approach might be to do a "big" lock around the entire submit and
get response code fargment.  Then it would be expensed against the
application thread running in the kernel.

We also have local changes to all it to run in polled mode without
the kernel thread when we are dumping a kernel backtrace into the
IPMI system event log.  That's nice when the kernel core hasn't
worked on a remote machine but we see the back trace in SEL.
 
| This problem need further investigation.

It might be good to instrument the code in ipmi.c in which it
sending a command and then getting status.  If that is actually
looking okay then maybe some application is doing something bad.

Doug A.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121018214302.GA8009>