From owner-freebsd-stable@FreeBSD.ORG  Mon Jan  8 05:06:48 2007
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: stable@FreeBSD.org
Delivered-To: freebsd-stable@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6862C16A407;
	Mon,  8 Jan 2007 05:06:48 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226])
	by mx1.freebsd.org (Postfix) with ESMTP id 0C35013C43E;
	Mon,  8 Jan 2007 05:06:48 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.2.162])
	by mailout2.pacific.net.au (Postfix) with ESMTP id 923DD6E305;
	Mon,  8 Jan 2007 16:06:44 +1100 (EST)
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailproxy1.pacific.net.au (Postfix) with ESMTP id 0D4398C08;
	Mon,  8 Jan 2007 16:06:44 +1100 (EST)
Date: Mon, 8 Jan 2007 16:06:44 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@delplex.bde.org
To: Sven Willenberger <sven@dmv.com>
In-Reply-To: <1168211205.22629.6.camel@lanshark.dmv.com>
Message-ID: <20070108154433.C75042@delplex.bde.org>
References: <1168211205.22629.6.camel@lanshark.dmv.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: stable@FreeBSD.org, freebsd-amd64@FreeBSD.org
Subject: Re: Panic in 6.2-PRERELEASE with bge on amd64
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Jan 2007 05:06:48 -0000

On Sun, 7 Jan 2007, Sven Willenberger wrote:

> I am starting a new thread on this as what I had assumed was a panic in
> nfsd turns out to be an issue with the bge driver. This is an amd64 box,
> dual processor (SMP kernel) that happens to be running nfsd. About every
> 3-5 days the kernel panics and I have finally managed to get a core
> dump.
> The system: FreeBSD 6.2-PRERELEASE #8: Tue Jan  2 10:57:39 EST 2007

Like most NIC drivers, bge unlocks and re-locks around its call to
ether_input() in its interrupt handler.  This isn't very safe, and it
certainly causes panics for bge.  I often see it panic when bringing
the interface down and up while input is arriving, on a non-SMP non-amd64
(actually i386) non-6.x (actually -current) system.  Bringing the
interface down is probably the worst case.  It creates a null pointer
for bge_intr() to follow.

> The short and dirty of the dump:
> ...
> --- trap 0xc, rip = 0xffffffff801d5f17, rsp = 0xffffffffb371ab50, rbp = 0xffffffffb371aba0 ---
> bge_rxeof() at bge_rxeof+0x3b7

What is the instruction here?

> bge_intr() at bge_intr+0x1c8
> ithread_loop() at ithread_loop+0x14c
> fork_exit() at fork_exit+0xbb
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffffffb371ad00, rbp = 0 ---

> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address   = 0x28

Looks like a null pointer panic anyway.  I guess the instruction is
movl to/from 0x28(%reg) where %reg is a null pointer.

> ...
> #8  0xffffffff801db818 in bge_intr (xsc=0x0) at /usr/src/sys/dev/bge/if_bge.c:2707

What is the statement here?  It presumably follow a null pointer and only
the exprssion for the pointer is interesting.  xsc is already null but
that is probably a bug in gdb, or the result of excessive optimization.
Compiling kernels with -O2 has little effect except to break debugging.

I rarely use gdb on kernels and haven't looked closely enough using ddb
to see where the null pointer for the panic on down/up came from.

BTW, the sbdrop panic in -current isn't bge-only or SMP-only.  I saw
it once for sk on a non-SMP system.  It rarely happens for non-SMP
(much more rarely than the panic in bge_intr()).  Under -current, on
an SMP amd64 system with bge, It happens almost every time on close
of the socket for a ttcp server if input is arriving at the time of
the close.  I haven't seen it for 6.x.

Bruce