Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Jun 2010 20:27:34 +1000
From:      Lawrence Stewart <lstewart@freebsd.org>
To:        Fabian Keil <freebsd-listen@fabiankeil.de>
Cc:        freebsd-current@freebsd.org
Subject:   Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!
Message-ID:  <4C1DED16.8020209@freebsd.org>
In-Reply-To: <20100619195823.53a7baaa@r500.local>
References:  <4C1492D0.6020704@freebsd.org> <4C1C3922.2050102@freebsd.org> <20100619195823.53a7baaa@r500.local>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Fabian,

On 06/20/10 03:58, Fabian Keil wrote:
> Lawrence Stewart<lstewart@freebsd.org>  wrote:
>
>> On 06/13/10 18:12, Lawrence Stewart wrote:
>
>>> The time has come to solicit some external testing for my SIFTR tool.
>>> I'm hoping to commit it within a week or so unless problems are discovered.
>
>>> I'm interested in all feedback and reports of success/failure, along
>>> with details of the architecture tested and number of CPUs if you would
>>> be so kind.
>
> I got the following hand-transcribed panic maybe a second after
> sysctl net.inet.siftr.enabled=1
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> [...]
> current process = 12 (swi4: clock)
> [ thread pid 12 tid 100006 ]
> Stopped at	siftr_chkpkt+0xd0:	addq	$0x1,0x8(%r14)
> db>  where
> Tracing pid 12 tid 100006 td 0xffffff00034037e0
> siftr_chkpt() at siftr_chkpkt+0xd0
> pfil_run_hooks() at pfil_run_hooks+0xb4
> ip_output() at ip_output+0x382
> tcp_output() tcp_output+0xa41
> tcp_timer_rexmt() at tcp_timer_rexmt+0x251
> softclock() at softclock+0x291
> intr_event_execute_handlers() at intr_event_execute_handlers+0x66
> ithread_loop at ithread_loop+0x8e
> fork_exit() at fork_exit+0x112
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffff800003ad30, rbp = 0 ---

So I've tracked down the line of code where the page fault is occurring:

         if (dir == PFIL_IN)
                 ss->n_in++;
         else
                 ss->n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats 
per-cpu and is initialised at the start of the function like so:

         ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your 
machine. I know very little about the inner workings of the DPCPU_* 
macros, but I'm pretty sure the way I use them in SIFTR is correct or at 
least as intended.

Could you please go ahead and retest using a GENERIC kernel and see if 
you can reproduce? There could be something in your custom kernel 
causing the offsets or linker set magic used by the DPCPU bits to break 
which in turn is triggering this panic in SIFTR.

Whether its your custom changes breaking DPCPU or DPCPU being fragile 
remains to be seen, but the good news for me is that it looks like SIFTR 
is off the hook :)

Cheers,
Lawrence



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C1DED16.8020209>