From owner-freebsd-stable@FreeBSD.ORG  Tue Oct 16 00:33:50 2007
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A30E716A421
	for <stable@freebsd.org>; Tue, 16 Oct 2007 00:33:50 +0000 (UTC)
	(envelope-from kris@FreeBSD.org)
Received: from weak.local (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx1.freebsd.org (Postfix) with ESMTP id DDDF313C480;
	Tue, 16 Oct 2007 00:33:49 +0000 (UTC)
	(envelope-from kris@FreeBSD.org)
Message-ID: <471406ED.7000307@FreeBSD.org>
Date: Tue, 16 Oct 2007 02:33:49 +0200
From: Kris Kennaway <kris@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728)
MIME-Version: 1.0
To: Esa Karkkainen <ejk@iki.fi>,  stable@freebsd.org
References: <20071004165755.GA1049@pp.htv.fi> <47120D83.1010703@FreeBSD.org>
	<20071015203202.GA17964@pp.htv.fi>
In-Reply-To: <20071015203202.GA17964@pp.htv.fi>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: Reproducable, possibly NFS related,
	fatal double fault in 6.2-R-p7
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Oct 2007 00:33:50 -0000

Esa Karkkainen wrote:
> On Sun, Oct 14, 2007 at 02:37:23PM +0200, Kris Kennaway wrote:
>> Esa Karkkainen wrote:
>>> 	I get "Fatal double fault" error when writing to a filesystem
>>> mounted from NFS server.
> 
> I got an offlist reply in which he suggested that the problem might be
> in nve driver.
> 
> I installed an additional Intel nic, appropriate lines from dmesg are
> as follows
> 
> fxp0: <Intel 82559 Pro/100 Ethernet> port 0xb000-0xb03f mem
> 0xe7200000-0xe7200fff,0xe7000000-0xe70fffff irq 11 at device 6.0 on pci1
> miibus1: <MII bus> on fxp0
> inphy0: <i82555 10/100 media interface> on miibus1
> inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> 
> After I started to use fxp0, I can dump(8) all the necessary filesystems
> to the NFS mount, with out panic.
> 
> When I used nve0 dump(8) or cp(1) managed to write less than megabyte to NFS
> mount and then machine paniced.
> 
> It didn't matter if I made dump(8) write to the NFS mount or to a local
> filesystem and then copied the file to NFS mount, the end result was a
> panic.
> 
>>> 	Both NFS server and client are running 6.2-RELEASE-p7.
> 
> Both machines have been updated to -p8.
> 
>>> # kgdb kernel.debug /home/crash/vmcore.2 
>>> Fatal double fault:
>>> eip = 0xc063242a
>> Can you look up these IPs in the kernel symbol table (see the developers 
>> handbook)?  This might give at least one clue, although I'm not sure it 
>> is relevant.
> 
> I'm sorry, but I need to learn alot more about gdb and debugging in
> general before I can find that information. IIRC I have written about
> ten or twenty lines of C in this millenia.

Well, it's explained in explicit detail in that document.  C code is not 
involved.

> I do have matching kernel.debug and vmcore files, but kernel modules etc
> have been removed before I made new kernel and world.

OK, most likely too late then.

>> You might also update to RELENG_6, I think there was at least one bug 
>> fixed that might have caused such a thing.
> 
> At the moment I don't have any stability problems with this machine, but
> I can upgrade to RELENG_6 before RELENG_6_3 is branched if that is
> necessary.
> 
>> Also try to rule out memory failure etc.
> 
> This machine has two 512MB DDR333 DIMM's.
> 
> I installed sysutils/memtest and ran three simultaneously, first two
> allocated 326 MB each and last one allocated 150 MB of memory, so I'd
> start to swap. No errors.

Well, as you say, such a limited test doesn't mean much.  Anyway, it may 
well have been nve, so see how you go without it.

kris