Date: Fri, 28 Sep 2012 13:54:37 +0400 From: Anton Yuzhaninov <citrin@citrin.ru> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: Problem with IPMI KCS driver Message-ID: <506573DD.2030808@citrin.ru> In-Reply-To: <201208290825.44198.jhb@freebsd.org> References: <503DE2AB.6030702@citrin.ru> <201208290825.44198.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 29.08.2012 16:25, John Baldwin wrote: > On Wednesday, August 29, 2012 5:36:43 am Anton Yuzhaninov wrote: >> We use servers witch motherboard Supermicro X8DTT-H and meet with such problem: >> when watchdogd started, server is rebooted by IPMI watchdog several times per week. >> >> After some debugging I've found, that sometimes IPMI driver entered endless >> loop, and watchdogd have no chances to reset watchdog timer. >> In such situation top show: >> >> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> ... >> 113 root -16 - 0K 16K CPU4 4 17:18 99.17% ipmi0: kcs >> >> Endless loop located in file /sys/dev/ipmi/ipmi_kcs.c and function >> kcs_wait_for_obf(): >> >> int status, start = ticks; >> >> status = INB(sc, KCS_CTL_STS); >> if (state == 0) { >> /* WAIT FOR OBF = 0 */ >> while (ticks - start< MAX_TIMEOUT&& status& KCS_STATUS_OBF) { >> DELAY(100); >> status = INB(sc, KCS_CTL_STS); >> } >> } else { >> /* WAIT FOR OBF = 1 */ >> while (ticks - start< MAX_TIMEOUT&& >> !(status& KCS_STATUS_OBF)) { >> DELAY(100); >> status = INB(sc, KCS_CTL_STS); >> } >> } >> >> It seems to be, that this loop intended to run no more than MAX_TIMEOUT ticks. >> but by some reason this timeout does not works and loop runs until reboot. >> >> Questions: >> 1. Is it correct to check ticks to implement timeout here? >> 2. how to fix this timeout? > > Hmm. Can you try this: > > Index: kern/kern_clock.c > =================================================================== > --- kern/kern_clock.c (revision 239819) > +++ kern/kern_clock.c (working copy) > @@ -382,7 +382,7 @@ > int stathz; > int profhz; > int profprocs; > -int ticks; > +volatile int ticks; > int psratio; > > static DPCPU_DEFINE(int, pcputicks); /* Per-CPU version of ticks. */ > @@ -469,7 +469,7 @@ > hardclock(int usermode, uintfptr_t pc) > { > > - atomic_add_int((volatile int *)&ticks, 1); > + atomic_add_int(&ticks, 1); > hardclock_cpu(usermode); > tc_ticktock(1); > cpu_tick_calibration(); > Index: sys/kernel.h > =================================================================== > --- sys/kernel.h (revision 239819) > +++ sys/kernel.h (working copy) > @@ -63,7 +63,7 @@ > extern int stathz; /* statistics clock's frequency */ > extern int profhz; /* profiling clock's frequency */ > extern int profprocs; /* number of process's profiling */ > -extern int ticks; > +extern volatile int ticks; > > #endif /* _KERNEL */ > > With extern volatile int ticks Infinite loop repeated not so often, as before, but still repeated. Symptoms is same: $ ps -ax -o pid,comm,wchan,state,\%cpu | grep ipmi 113 ipmi0: kcs - RL 100.0 1317 watchdogd ipmire Ds 0.0 DDB trace for pid 113: Tracing pid 113 tid 100359 td 0xffffff0007913470 cpustop_handler() at cpustop_handler+0x37 ipi_nmi_handler() at ipi_nmi_handler+0x30 trap() at trap+0x345 nmi_calltrap() at nmi_calltrap+0x8 --- trap 0x13, rip = 0xffffffff809c6e64, rsp = 0xffffffff80fd1ec0, rbp = 0xffffff88425d4b30 --- DELAY() at DELAY+0x64 kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 kcs_read_byte() at kcs_read_byte+0x7d kcs_loop() at kcs_loop+0x372 fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe I can type cont from ddb, wait some time, enter to ddb - trace for pid 113 will be same. kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to /usr/src/sys/dev/ipmi/ipmi_kcs.c:94 91 while (ticks - start < MAX_TIMEOUT && 92 !(status & KCS_STATUS_OBF)) { 93 DELAY(100); 94 status = INB(sc, KCS_CTL_STS); 95 } -- Anton Yuzhaninov
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?506573DD.2030808>