From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 20 20:28:21 2011 Return-Path: Delivered-To: hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B92F8106566C; Tue, 20 Sep 2011 20:28:21 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id D753F8FC12; Tue, 20 Sep 2011 20:28:20 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA09286; Tue, 20 Sep 2011 23:28:19 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1R66vT-000LDz-9G; Tue, 20 Sep 2011 23:28:19 +0300 Message-ID: <4E78F762.5000906@FreeBSD.org> Date: Tue, 20 Sep 2011 23:28:18 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20110907 Thunderbird/6.0.2 MIME-Version: 1.0 To: Alexander Motin References: <4E78E755.8050404@FreeBSD.org> <4E78F1E7.7020502@FreeBSD.org> In-Reply-To: <4E78F1E7.7020502@FreeBSD.org> X-Enigmail-Version: undefined Content-Type: text/plain; charset=x-viet-vps Content-Transfer-Encoding: 7bit Cc: hackers@FreeBSD.org Subject: Re: SW_WATCHDOG vs new eventtimer code X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Sep 2011 20:28:21 -0000 on 20/09/2011 23:04 Alexander Motin said the following: > Hi. > > On 20.09.2011 22:19, Andriy Gapon wrote: >> just want to check with you first if the following makes sense. >> I use SW_WATCHDOG on one of the test machines, which was recently updated to >> from stable/8 to head. Now it seems to get seemingly random watchdog events. >> My theory is that this is because of the eventtimer logic. >> If during idle period we accumulate enough timer ticks and then run all those >> ticks very rapidly, then the SW_WATCHDOG code may get an impression that it was >> not patted for many real ticks. >> Not sure what would be the best way to make SW_WATCHDOG happier/smarter. > > Eventtimer code now set to generate interrupts at least 4 times per > second for each CPU. As soon as SW_WATCHDOG only handles periods more > then one second, I would say it should not be hurt. I would try to add > some debug there to see what's going on (how big the tick busts are). > I'll try it to do it tomorrow. Just in case, here is a debugging snippet from a panic that I've got: ... #12 0xffffffff80425d80 in watchdog_fire () at /usr/src/sys/kern/kern_clock.c:858 #13 0xffffffff8042603e in hardclock_anycpu (cnt=15761, usermode=Variable "usermode" is not available. ) at atomic.h:183 #14 0xffffffff80660ae5 in handleevents (now=0xffffff80e3e0b8b0, fake=0) at /usr/src/sys/kern/kern_clocksource.c:209 #15 0xffffffff80661b48 in timercb (et=Variable "et" is not available. ) at /usr/src/sys/kern/kern_clocksource.c:379 #16 0xffffffff802cc068 in hpet_intr_single (arg=Variable "arg" is not available. ) at /usr/src/sys/dev/acpica/acpi_hpet.c:258 #17 0xffffffff802cc71e in hpet_intr (arg=0xffffff80e3e0b5b0) at /usr/src/sys/dev/acpica/acpi_hpet.c:276 #18 0xffffffff80444b02 in intr_event_handle (ie=0xfffffe0002751500, frame=0xffffff80e3e0ba30) at /usr/src/sys/kern/kern_intr.c:1428 #19 0xffffffff8062f920 in intr_remove_handler (cookie=0xffffff80e3e0b5b0) at /usr/src/sys/amd64/amd64/intr_machdep.c:197 #20 0xffffffff8069cca9 in lapic_enable_pmc () at /usr/src/sys/x86/x86/local_apic.c:431 #21 0xffffffff8062cc70 in Xapic_isr2 () at apic_vector.S:87 #22 0xffffffff80443118 in intr_event_execute_handlers (p=0xfffffe0002758000, ie=0xfffffe0002a5eb00) at /usr/src/sys/kern/kern_intr.c:1244 #23 0xffffffff80444164 in ithread_loop (arg=0xfffffe0002758000) at /usr/src/sys/kern/kern_intr.c:1269 #24 0xffffffff8044053a in fork_exit (callout=0xffffffff80444024 , arg=0xfffffe0002b4f700, frame=0xffffff80e3e0bc50) at /usr/src/sys/kern/kern_fork.c:1024 #25 0xffffffff8062cb0e in Xint0x80_syscall () at ia32_exception.S:62 #26 0x0000000000000000 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 14 #14 0xffffffff80660ae5 in handleevents (now=0xffffff80e3e0b8b0, fake=0) at /usr/src/sys/kern/kern_clocksource.c:209 209 while (bintime_cmp(now, &state->nextstat, >=)) { (kgdb) list 204 } 205 if (runs && fake < 2) { 206 hardclock_anycpu(runs, usermode); 207 done = 1; 208 } 209 while (bintime_cmp(now, &state->nextstat, >=)) { 210 if (fake < 2) 211 statclock(usermode); 212 bintime_add(&state->nextstat, &statperiod); 213 done = 1; (kgdb) p state->nextstat $1 = {sec = 90, frac = 15986939599958264124} (kgdb) p *now $3 = {sec = 106, frac = 11494276814354478452} (kgdb) p statperiod $4 = {sec = 0, frac = 145249953336295682} (kgdb) fr 13 #13 0xffffffff8042603e in hardclock_anycpu (cnt=15761, usermode=Variable "usermode" is not available. ) at atomic.h:183 183 atomic.h: No such file or directory. in atomic.h (kgdb) p cnt $5 = 15761 (kgdb) p newticks $6 = 15000 (kgdb) p watchdog_ticks $7 = 16000 Watchdog timeout was set to ~16 seconds. -- Andriy Gapon