Date: Tue, 3 May 2011 11:22:57 +0200 From: Daniel Hartmeier <daniel@benzedrine.cx> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-stable@freebsd.org, freebsd-pf@freebsd.org Subject: Re: RELENG_8 pf stack issue (state count spiraling out of control) Message-ID: <20110503092257.GC9657@insomnia.benzedrine.cx> In-Reply-To: <20110503015854.GA31444@icarus.home.lan> References: <20110503015854.GA31444@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote: > Here's one piece of core.0.txt which makes no sense to me -- the "rate" > column. I have a very hard time believing that was the interrupt rate > of all the relevant devices at the time (way too high). Maybe this data > becomes wrong only during a coredump? The total column I could believe. > > ------------------------------------------------------------------------ > vmstat -i > > interrupt total rate > irq4: uart0 54768 912 > irq6: fdc0 1 0 > irq17: uhci1+ 172 2 > irq23: uhci3 ehci1+ 2367 39 > cpu0: timer 13183882632 219731377 > irq256: em0 260491055 4341517 > irq257: em1 127555036 2125917 > irq258: ahci0 225923164 3765386 > cpu2: timer 13183881837 219731363 > cpu1: timer 13002196469 216703274 > cpu3: timer 13183881783 219731363 > Total 53167869284 886131154 > ------------------------------------------------------------------------ > > Here's what a normal "vmstat -i" shows from the command-line: > > # vmstat -i > interrupt total rate > irq4: uart0 518 0 > irq6: fdc0 1 0 > irq23: uhci3 ehci1+ 145 0 > cpu0: timer 19041199 1999 > irq256: em0 614280 64 > irq257: em1 168529 17 > irq258: ahci0 355536 37 > cpu2: timer 19040462 1999 > cpu1: timer 19040458 1999 > cpu3: timer 19040454 1999 > Total 77301582 8119 The cpu0-3 timer totals seem consistent in the first output: 13183881783/1999/60/60/24 matches 76 days of uptime. The high rate in the first output comes from vmstat.c dointr()'s division of the total by the uptime: struct timespec sp; clock_gettime(CLOCK_MONOTONIC, &sp); uptime = sp.tv_sec; for (i = 0; i < nintr; i++) { printf("%-*s %20lu %10lu\n", istrnamlen, intrname, *intrcnt, *intrcnt / uptime); } >From this we can deduce that the value of uptime must have been 13183881783/219731363 = 60 (seconds). Since the uptime was 76 days (and not just 60 seconds), the CLOCK_MONOTONIC clock must have reset, wrapped, or been overwritten. I don't know how that's possible, but if this means that the kernel variable time_second was possibly going back, that could very well have messed up pf's state purging. Daniel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110503092257.GC9657>