Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 May 2011 11:22:57 +0200
From:      Daniel Hartmeier <daniel@benzedrine.cx>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-stable@freebsd.org, freebsd-pf@freebsd.org
Subject:   Re: RELENG_8 pf stack issue (state count spiraling out of control)
Message-ID:  <20110503092257.GC9657@insomnia.benzedrine.cx>
In-Reply-To: <20110503015854.GA31444@icarus.home.lan>
References:  <20110503015854.GA31444@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:

> Here's one piece of core.0.txt which makes no sense to me -- the "rate"
> column.  I have a very hard time believing that was the interrupt rate
> of all the relevant devices at the time (way too high).  Maybe this data
> becomes wrong only during a coredump?  The total column I could believe.
> 
> ------------------------------------------------------------------------
> vmstat -i
> 
> interrupt                          total       rate
> irq4: uart0                        54768        912
> irq6: fdc0                             1          0
> irq17: uhci1+                        172          2
> irq23: uhci3 ehci1+                 2367         39
> cpu0: timer                  13183882632  219731377
> irq256: em0                    260491055    4341517
> irq257: em1                    127555036    2125917
> irq258: ahci0                  225923164    3765386
> cpu2: timer                  13183881837  219731363
> cpu1: timer                  13002196469  216703274
> cpu3: timer                  13183881783  219731363
> Total                        53167869284  886131154
> ------------------------------------------------------------------------
> 
> Here's what a normal "vmstat -i" shows from the command-line:
> 
> # vmstat -i
> interrupt                          total       rate
> irq4: uart0                          518          0
> irq6: fdc0                             1          0
> irq23: uhci3 ehci1+                  145          0
> cpu0: timer                     19041199       1999
> irq256: em0                       614280         64
> irq257: em1                       168529         17
> irq258: ahci0                     355536         37
> cpu2: timer                     19040462       1999
> cpu1: timer                     19040458       1999
> cpu3: timer                     19040454       1999
> Total                           77301582       8119

The cpu0-3 timer totals seem consistent in the first output: 
13183881783/1999/60/60/24 matches 76 days of uptime.

The high rate in the first output comes from vmstat.c dointr()'s
division of the total by the uptime:

        struct timespec sp;
        clock_gettime(CLOCK_MONOTONIC, &sp);
        uptime = sp.tv_sec;
	for (i = 0; i < nintr; i++) {
		printf("%-*s %20lu %10lu\n", istrnamlen, intrname,
		    *intrcnt, *intrcnt / uptime);
	}

>From this we can deduce that the value of uptime must have been
13183881783/219731363 = 60 (seconds).

Since the uptime was 76 days (and not just 60 seconds), the
CLOCK_MONOTONIC clock must have reset, wrapped, or been overwritten.

I don't know how that's possible, but if this means that the kernel
variable time_second was possibly going back, that could very well
have messed up pf's state purging.

Daniel



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110503092257.GC9657>