From owner-freebsd-stable@FreeBSD.ORG Mon Jan 8 05:06:48 2007 Return-Path: X-Original-To: stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6862C16A407; Mon, 8 Jan 2007 05:06:48 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.freebsd.org (Postfix) with ESMTP id 0C35013C43E; Mon, 8 Jan 2007 05:06:48 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout2.pacific.net.au (Postfix) with ESMTP id 923DD6E305; Mon, 8 Jan 2007 16:06:44 +1100 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id 0D4398C08; Mon, 8 Jan 2007 16:06:44 +1100 (EST) Date: Mon, 8 Jan 2007 16:06:44 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Sven Willenberger In-Reply-To: <1168211205.22629.6.camel@lanshark.dmv.com> Message-ID: <20070108154433.C75042@delplex.bde.org> References: <1168211205.22629.6.camel@lanshark.dmv.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: stable@FreeBSD.org, freebsd-amd64@FreeBSD.org Subject: Re: Panic in 6.2-PRERELEASE with bge on amd64 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Jan 2007 05:06:48 -0000 On Sun, 7 Jan 2007, Sven Willenberger wrote: > I am starting a new thread on this as what I had assumed was a panic in > nfsd turns out to be an issue with the bge driver. This is an amd64 box, > dual processor (SMP kernel) that happens to be running nfsd. About every > 3-5 days the kernel panics and I have finally managed to get a core > dump. > The system: FreeBSD 6.2-PRERELEASE #8: Tue Jan 2 10:57:39 EST 2007 Like most NIC drivers, bge unlocks and re-locks around its call to ether_input() in its interrupt handler. This isn't very safe, and it certainly causes panics for bge. I often see it panic when bringing the interface down and up while input is arriving, on a non-SMP non-amd64 (actually i386) non-6.x (actually -current) system. Bringing the interface down is probably the worst case. It creates a null pointer for bge_intr() to follow. > The short and dirty of the dump: > ... > --- trap 0xc, rip = 0xffffffff801d5f17, rsp = 0xffffffffb371ab50, rbp = 0xffffffffb371aba0 --- > bge_rxeof() at bge_rxeof+0x3b7 What is the instruction here? > bge_intr() at bge_intr+0x1c8 > ithread_loop() at ithread_loop+0x14c > fork_exit() at fork_exit+0xbb > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffffffb371ad00, rbp = 0 --- > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x28 Looks like a null pointer panic anyway. I guess the instruction is movl to/from 0x28(%reg) where %reg is a null pointer. > ... > #8 0xffffffff801db818 in bge_intr (xsc=0x0) at /usr/src/sys/dev/bge/if_bge.c:2707 What is the statement here? It presumably follow a null pointer and only the exprssion for the pointer is interesting. xsc is already null but that is probably a bug in gdb, or the result of excessive optimization. Compiling kernels with -O2 has little effect except to break debugging. I rarely use gdb on kernels and haven't looked closely enough using ddb to see where the null pointer for the panic on down/up came from. BTW, the sbdrop panic in -current isn't bge-only or SMP-only. I saw it once for sk on a non-SMP system. It rarely happens for non-SMP (much more rarely than the panic in bge_intr()). Under -current, on an SMP amd64 system with bge, It happens almost every time on close of the socket for a ttcp server if input is arriving at the time of the close. I haven't seen it for 6.x. Bruce