From owner-freebsd-net@FreeBSD.ORG  Fri Jul  6 19:40:29 2007
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id CB39416A46B
	for <freebsd-net@freebsd.org>; Fri,  6 Jul 2007 19:40:29 +0000 (UTC)
	(envelope-from admin@lissyara.su)
Received: from smtp.qwerty.ru (smtp.qwerty.ru [87.240.2.134])
	by mx1.freebsd.org (Postfix) with ESMTP id 3F8C713C447
	for <freebsd-net@freebsd.org>; Fri,  6 Jul 2007 19:40:29 +0000 (UTC)
	(envelope-from admin@lissyara.su)
Received: from ussr.lissyara.int.otradno.ru (unknown [10.21.64.215])
	by smtp.qwerty.ru (Spam Firewall) with ESMTP id 0AC31190E2B5
	for <freebsd-net@freebsd.org>; Fri,  6 Jul 2007 23:40:26 +0400 (MSD)
Message-ID: <468E9AAA.6010307@lissyara.su>
Date: Fri, 06 Jul 2007 23:40:26 +0400
From: Alex Keda <admin@lissyara.su>
User-Agent: Thunderbird 2.0.0.4 (X11/20070630)
MIME-Version: 1.0
CC: freebsd-net@freebsd.org
References: <468E5A94.3030509@lissyara.su>	<20070706154247.GH2200@deviant.kiev.zoral.com.ua>	<468E88BB.2020009@lissyara.su>
	<20070706192300.GI2200@deviant.kiev.zoral.com.ua>
In-Reply-To: <20070706192300.GI2200@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: Fatal double fault while copy to NFS filesystems
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jul 2007 19:40:29 -0000

Kostik Belousov пишет:
> On Fri, Jul 06, 2007 at 10:23:55PM +0400, Alex Keda wrote:
>   
>> Kostik Belousov пишет:
>>     
>>> On Fri, Jul 06, 2007 at 07:07:00PM +0400, Alex Keda wrote:
>>>  
>>>       
>>>> When I copy files to NFS on another host kernel crash:
>>>> Fatal double fault:
>>>> eip = 0xc07e9e29
>>>> esp = 0xe31a3000
>>>> ebp = 0xe31a3000
>>>> cpuid = 1; apic id = 01
>>>> panic: double fault
>>>> cpuid = 1
>>>> =======================
>>>> before this, I see on /var/log/messages
>>>> nve0: device timeout
>>>> =======================
>>>> how repeat problem:
>>>> ussr# df -h
>>>> Filesystem     Size    Used   Avail Capacity  Mounted on
>>>> /dev/ad0s1a     72G    6.1G     60G     9%    /
>>>> devfs          1.0K    1.0K      0B   100%    /dev
>>>> ussr# dd if=/dev/zero of=file_20mb bs=1m count=20
>>>> ussr# mount 192.168.254.254:/shares /mnt/
>>>> ussr# df -h
>>>> Filesystem                 Size    Used   Avail Capacity  Mounted on
>>>> /dev/ad0s1a                 72G    6.1G     60G     9%    /
>>>> devfs                      1.0K    1.0K      0B   100%    /dev
>>>> 192.168.254.254:/shares    271G    179G     89G    67%    /mnt
>>>> ussr# cp file_20mb /mnt/
>>>> then, after 3-5 second I see "device timeout", and later, after 5-7 
>>>> seconds - system crash
>>>> =====================
>>>> another information - this problem appearance after I upgrade remote 
>>>> machine (6.2-RELEASE-p5), I change CPU from Celeron 466 to PIII 800.
>>>> interface on remote machine - 3com509b
>>>> if I slow copy to remote machine (~100kb/s - 10% interface usage) - all 
>>>> good. System not crash...
>>>> if I copy from remote machine - all good - system not crash...
>>>> on logs on remote machine - all clean.
>>>> =====================
>>>> 3 days ago I upgrade my system to 6.2-RELEASE-p5, but - problem exists...
>>>>    
>>>>         
>>> Double fault issue might be the problem that is fixed in CURRENT/RELENG_6.
>>> To confirm this, ddb backtrace after the panic will be helpful. You will
>>> need to compile DDB into the kernel, obtain DDB prompt after the panic
>>> and issue "bt" command.
>>>  
>>>       
>> Fatal double fault:
>> eip = 0xc07e8bd9
>> esp = 0xe3793000
>> ebp = 0xe3793020
>> cpuid = 0; apic id = 00
>> panic:double fault
>> cpuid = 0
>> KDB: enter: panic
>> [thread pid 25 tid 100019]
>> Stopped at kdb_enter+0x2b:nop
>>
>> Tracing pid 25 tid 100019 td 0xc527b600
>> kdb_enter(c090f266) at kdb_enter+0x2b
>> panic(c092d4c9,c092d671,0,0,0,...) at panic+0x127
>> dblfault_handler() at dblfault_handler+0x7a
>> --- trap 0x17, eip = 0xc07e88bd9, esp = 0xe3793000, ebp = 0xe3793020 ---
>> uma_zfree_arg(c1857960,c5718900,0) at uma_zfree_arg+0x21
>> m_freem(c5718900,e54ad000,e52ac65c,c543e810,1,...) at m_freem+0x2e
>> nve_ospackettx(c543e800,e52ac65c,1,e54ad000,0,...) at nve_ospackettx+0x57
>> UpdateTransmitDescRingData() at UpdateTransmitDescRingData+0xd3
>>     
> Is this the full trace ? It seems to be unlikely that this is a problem I
> thought of.
>   
Yes. this - output 'bt' command:

Tracing pid 25 tid 100019 td 0xc527b600
kdb_enter(c090f266) at kdb_enter+0x2b
panic(c092d4c9,c092d671,0,0,0,...) at panic+0x127
dblfault_handler() at dblfault_handler+0x7a
--- trap 0x17, eip = 0xc07e88bd9, esp = 0xe3793000, ebp = 0xe3793020 ---
uma_zfree_arg(c1857960,c5718900,0) at uma_zfree_arg+0x21
m_freem(c5718900,e54ad000,e52ac65c,c543e810,1,...) at m_freem+0x2e
nve_ospackettx(c543e800,e52ac65c,1,e54ad000,0,...) at nve_ospackettx+0x57
UpdateTransmitDescRingData() at UpdateTransmitDescRingData+0xd3

============
but there I see path to solution my problem (nve_ospackettx - i think - 
driver problem?) - tomorrow I insert fxp card and test again.