From owner-freebsd-stable@FreeBSD.ORG  Fri Sep 28 09:54:45 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B80CA106566C;
	Fri, 28 Sep 2012 09:54:45 +0000 (UTC)
	(envelope-from citrin@citrin.ru)
Received: from mail-chaos.rambler.ru (mail-chaos.rambler.ru [81.19.68.130])
	by mx1.freebsd.org (Postfix) with ESMTP id 2CC238FC16;
	Fri, 28 Sep 2012 09:54:45 +0000 (UTC)
Received: from citrin.office.vega.ru (office-nat.spylog.net [193.169.234.6])
	(Authenticated sender: citrin@citrin.ru)
	by mail-chaos.rambler.ru (Postfix) with ESMTPSA id 0FF9C17026;
	Fri, 28 Sep 2012 13:54:38 +0400 (MSD)
Message-ID: <506573DD.2030808@citrin.ru>
Date: Fri, 28 Sep 2012 13:54:37 +0400
From: Anton Yuzhaninov <citrin@citrin.ru>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
	rv:6.0.2) Gecko/20110922 Thunderbird/6.0.2
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <503DE2AB.6030702@citrin.ru> <201208290825.44198.jhb@freebsd.org>
In-Reply-To: <201208290825.44198.jhb@freebsd.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org
Subject: Re: Problem with IPMI KCS driver
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Sep 2012 09:54:45 -0000

On 29.08.2012 16:25, John Baldwin wrote:
> On Wednesday, August 29, 2012 5:36:43 am Anton Yuzhaninov wrote:
>> We use servers witch motherboard Supermicro X8DTT-H and meet with such problem:
>> when watchdogd started, server is rebooted by IPMI watchdog several times per week.
>>
>> After some debugging I've found, that sometimes IPMI driver entered endless
>> loop, and watchdogd have no chances to reset watchdog timer.
>> In such situation top show:
>>
>> PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>> ...
>> 113 root          -16    -     0K    16K CPU4    4  17:18 99.17% ipmi0: kcs
>>
>> Endless loop located in file /sys/dev/ipmi/ipmi_kcs.c and function
>> kcs_wait_for_obf():
>>
>>           int status, start = ticks;
>>
>>           status = INB(sc, KCS_CTL_STS);
>>           if (state == 0) {
>>                   /* WAIT FOR OBF = 0 */
>>                   while (ticks - start<  MAX_TIMEOUT&&  status&  KCS_STATUS_OBF) {
>>                           DELAY(100);
>>                           status = INB(sc, KCS_CTL_STS);
>>                   }
>>           } else {
>>                   /* WAIT FOR OBF = 1 */
>>                   while (ticks - start<  MAX_TIMEOUT&&
>>                       !(status&  KCS_STATUS_OBF)) {
>>                           DELAY(100);
>>                           status = INB(sc, KCS_CTL_STS);
>>                   }
>>           }
>>
>> It seems to be, that this loop intended to run no more than MAX_TIMEOUT ticks.
>> but by some reason this timeout does not works and loop runs until reboot.
>>
>> Questions:
>> 1. Is it correct to check ticks to implement timeout here?
>> 2. how to fix this timeout?
>
> Hmm.  Can you try this:
>
> Index: kern/kern_clock.c
> ===================================================================
> --- kern/kern_clock.c	(revision 239819)
> +++ kern/kern_clock.c	(working copy)
> @@ -382,7 +382,7 @@
>   int	stathz;
>   int	profhz;
>   int	profprocs;
> -int	ticks;
> +volatile int	ticks;
>   int	psratio;
>
>   static DPCPU_DEFINE(int, pcputicks);	/* Per-CPU version of ticks. */
> @@ -469,7 +469,7 @@
>   hardclock(int usermode, uintfptr_t pc)
>   {
>
> -	atomic_add_int((volatile int *)&ticks, 1);
> +	atomic_add_int(&ticks, 1);
>   	hardclock_cpu(usermode);
>   	tc_ticktock(1);
>   	cpu_tick_calibration();
> Index: sys/kernel.h
> ===================================================================
> --- sys/kernel.h	(revision 239819)
> +++ sys/kernel.h	(working copy)
> @@ -63,7 +63,7 @@
>   extern int stathz;			/* statistics clock's frequency */
>   extern int profhz;			/* profiling clock's frequency */
>   extern int profprocs;			/* number of process's profiling */
> -extern int ticks;
> +extern volatile int ticks;
>
>   #endif /* _KERNEL */
>
>

With
extern volatile int ticks

Infinite loop repeated not so often, as before, but still repeated.

Symptoms is same:

$ ps -ax -o pid,comm,wchan,state,\%cpu | grep ipmi
   113 ipmi0: kcs    -      RL   100.0
  1317 watchdogd     ipmire Ds    0.0

DDB trace for pid 113:
Tracing pid 113 tid 100359 td 0xffffff0007913470
cpustop_handler() at cpustop_handler+0x37
ipi_nmi_handler() at ipi_nmi_handler+0x30
trap() at trap+0x345
nmi_calltrap() at nmi_calltrap+0x8
--- trap 0x13, rip = 0xffffffff809c6e64, rsp = 0xffffffff80fd1ec0, rbp = 
0xffffff88425d4b30 ---
DELAY() at DELAY+0x64
kcs_wait_for_obf() at kcs_wait_for_obf+0xb6
kcs_read_byte() at kcs_read_byte+0x7d
kcs_loop() at kcs_loop+0x372
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe

I can type cont from ddb, wait some time, enter to ddb - trace for pid 113 will 
be same.

kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to 
/usr/src/sys/dev/ipmi/ipmi_kcs.c:94

  91                 while (ticks - start < MAX_TIMEOUT &&
  92                     !(status & KCS_STATUS_OBF)) {
  93                         DELAY(100);
  94                         status = INB(sc, KCS_CTL_STS);
  95                 }

-- 
  Anton Yuzhaninov