From owner-freebsd-current@FreeBSD.ORG Sun Jun 20 10:27:37 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF1741065672 for ; Sun, 20 Jun 2010 10:27:37 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id A28718FC16 for ; Sun, 20 Jun 2010 10:27:37 +0000 (UTC) Received: from lawrence1.loshell.room52.net (unknown [59.167.184.191]) by lauren.room52.net (Postfix) with ESMTPSA id 3E1CE7E84A; Sun, 20 Jun 2010 20:27:35 +1000 (EST) Message-ID: <4C1DED16.8020209@freebsd.org> Date: Sun, 20 Jun 2010 20:27:34 +1000 From: Lawrence Stewart User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-AU; rv:1.9.1.9) Gecko/20100405 Thunderbird/3.0.4 MIME-Version: 1.0 To: Fabian Keil References: <4C1492D0.6020704@freebsd.org> <4C1C3922.2050102@freebsd.org> <20100619195823.53a7baaa@r500.local> In-Reply-To: <20100619195823.53a7baaa@r500.local> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org Subject: Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU! X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Jun 2010 10:27:37 -0000 Hi Fabian, On 06/20/10 03:58, Fabian Keil wrote: > Lawrence Stewart wrote: > >> On 06/13/10 18:12, Lawrence Stewart wrote: > >>> The time has come to solicit some external testing for my SIFTR tool. >>> I'm hoping to commit it within a week or so unless problems are discovered. > >>> I'm interested in all feedback and reports of success/failure, along >>> with details of the architecture tested and number of CPUs if you would >>> be so kind. > > I got the following hand-transcribed panic maybe a second after > sysctl net.inet.siftr.enabled=1 > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > [...] > current process = 12 (swi4: clock) > [ thread pid 12 tid 100006 ] > Stopped at siftr_chkpkt+0xd0: addq $0x1,0x8(%r14) > db> where > Tracing pid 12 tid 100006 td 0xffffff00034037e0 > siftr_chkpt() at siftr_chkpkt+0xd0 > pfil_run_hooks() at pfil_run_hooks+0xb4 > ip_output() at ip_output+0x382 > tcp_output() tcp_output+0xa41 > tcp_timer_rexmt() at tcp_timer_rexmt+0x251 > softclock() at softclock+0x291 > intr_event_execute_handlers() at intr_event_execute_handlers+0x66 > ithread_loop at ithread_loop+0x8e > fork_exit() at fork_exit+0x112 > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff800003ad30, rbp = 0 --- So I've tracked down the line of code where the page fault is occurring: if (dir == PFIL_IN) ss->n_in++; else ss->n_out++; ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats per-cpu and is initialised at the start of the function like so: ss = DPCPU_PTR(ss); So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your machine. I know very little about the inner workings of the DPCPU_* macros, but I'm pretty sure the way I use them in SIFTR is correct or at least as intended. Could you please go ahead and retest using a GENERIC kernel and see if you can reproduce? There could be something in your custom kernel causing the offsets or linker set magic used by the DPCPU bits to break which in turn is triggering this panic in SIFTR. Whether its your custom changes breaking DPCPU or DPCPU being fragile remains to be seen, but the good news for me is that it looks like SIFTR is off the hook :) Cheers, Lawrence