From owner-freebsd-stable@FreeBSD.ORG Fri Sep 28 09:54:45 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B80CA106566C; Fri, 28 Sep 2012 09:54:45 +0000 (UTC) (envelope-from citrin@citrin.ru) Received: from mail-chaos.rambler.ru (mail-chaos.rambler.ru [81.19.68.130]) by mx1.freebsd.org (Postfix) with ESMTP id 2CC238FC16; Fri, 28 Sep 2012 09:54:45 +0000 (UTC) Received: from citrin.office.vega.ru (office-nat.spylog.net [193.169.234.6]) (Authenticated sender: citrin@citrin.ru) by mail-chaos.rambler.ru (Postfix) with ESMTPSA id 0FF9C17026; Fri, 28 Sep 2012 13:54:38 +0400 (MSD) Message-ID: <506573DD.2030808@citrin.ru> Date: Fri, 28 Sep 2012 13:54:37 +0400 From: Anton Yuzhaninov User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:6.0.2) Gecko/20110922 Thunderbird/6.0.2 MIME-Version: 1.0 To: John Baldwin References: <503DE2AB.6030702@citrin.ru> <201208290825.44198.jhb@freebsd.org> In-Reply-To: <201208290825.44198.jhb@freebsd.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: Problem with IPMI KCS driver X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Sep 2012 09:54:45 -0000 On 29.08.2012 16:25, John Baldwin wrote: > On Wednesday, August 29, 2012 5:36:43 am Anton Yuzhaninov wrote: >> We use servers witch motherboard Supermicro X8DTT-H and meet with such problem: >> when watchdogd started, server is rebooted by IPMI watchdog several times per week. >> >> After some debugging I've found, that sometimes IPMI driver entered endless >> loop, and watchdogd have no chances to reset watchdog timer. >> In such situation top show: >> >> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> ... >> 113 root -16 - 0K 16K CPU4 4 17:18 99.17% ipmi0: kcs >> >> Endless loop located in file /sys/dev/ipmi/ipmi_kcs.c and function >> kcs_wait_for_obf(): >> >> int status, start = ticks; >> >> status = INB(sc, KCS_CTL_STS); >> if (state == 0) { >> /* WAIT FOR OBF = 0 */ >> while (ticks - start< MAX_TIMEOUT&& status& KCS_STATUS_OBF) { >> DELAY(100); >> status = INB(sc, KCS_CTL_STS); >> } >> } else { >> /* WAIT FOR OBF = 1 */ >> while (ticks - start< MAX_TIMEOUT&& >> !(status& KCS_STATUS_OBF)) { >> DELAY(100); >> status = INB(sc, KCS_CTL_STS); >> } >> } >> >> It seems to be, that this loop intended to run no more than MAX_TIMEOUT ticks. >> but by some reason this timeout does not works and loop runs until reboot. >> >> Questions: >> 1. Is it correct to check ticks to implement timeout here? >> 2. how to fix this timeout? > > Hmm. Can you try this: > > Index: kern/kern_clock.c > =================================================================== > --- kern/kern_clock.c (revision 239819) > +++ kern/kern_clock.c (working copy) > @@ -382,7 +382,7 @@ > int stathz; > int profhz; > int profprocs; > -int ticks; > +volatile int ticks; > int psratio; > > static DPCPU_DEFINE(int, pcputicks); /* Per-CPU version of ticks. */ > @@ -469,7 +469,7 @@ > hardclock(int usermode, uintfptr_t pc) > { > > - atomic_add_int((volatile int *)&ticks, 1); > + atomic_add_int(&ticks, 1); > hardclock_cpu(usermode); > tc_ticktock(1); > cpu_tick_calibration(); > Index: sys/kernel.h > =================================================================== > --- sys/kernel.h (revision 239819) > +++ sys/kernel.h (working copy) > @@ -63,7 +63,7 @@ > extern int stathz; /* statistics clock's frequency */ > extern int profhz; /* profiling clock's frequency */ > extern int profprocs; /* number of process's profiling */ > -extern int ticks; > +extern volatile int ticks; > > #endif /* _KERNEL */ > > With extern volatile int ticks Infinite loop repeated not so often, as before, but still repeated. Symptoms is same: $ ps -ax -o pid,comm,wchan,state,\%cpu | grep ipmi 113 ipmi0: kcs - RL 100.0 1317 watchdogd ipmire Ds 0.0 DDB trace for pid 113: Tracing pid 113 tid 100359 td 0xffffff0007913470 cpustop_handler() at cpustop_handler+0x37 ipi_nmi_handler() at ipi_nmi_handler+0x30 trap() at trap+0x345 nmi_calltrap() at nmi_calltrap+0x8 --- trap 0x13, rip = 0xffffffff809c6e64, rsp = 0xffffffff80fd1ec0, rbp = 0xffffff88425d4b30 --- DELAY() at DELAY+0x64 kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 kcs_read_byte() at kcs_read_byte+0x7d kcs_loop() at kcs_loop+0x372 fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe I can type cont from ddb, wait some time, enter to ddb - trace for pid 113 will be same. kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to /usr/src/sys/dev/ipmi/ipmi_kcs.c:94 91 while (ticks - start < MAX_TIMEOUT && 92 !(status & KCS_STATUS_OBF)) { 93 DELAY(100); 94 status = INB(sc, KCS_CTL_STS); 95 } -- Anton Yuzhaninov