Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Nov 2006 12:39:12 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Max Laier <max@love2party.net>, freebsd-net@freebsd.org, Nicolae Namolovan <adrenalinup@gmail.com>
Subject:   Re: ping -f panic [Re: Marvell Yukon 88E8056 FreeBsd Drivers]
Message-ID:  <20061126123700.G2108@fledge.watson.org>
In-Reply-To: <20061126165353.Y47830@delplex.bde.org>
References:  <f027bef40611240533k453e90dfve6f662794bba3b84@mail.gmail.com> <20061125015223.GA51565@cdnetworks.co.kr> <f027bef40611251420s6e898e4iba9cf5928266fb5@mail.gmail.com> <200611260331.28847.max@love2party.net> <20061126165353.Y47830@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sun, 26 Nov 2006, Bruce Evans wrote:

> On Sun, 26 Nov 2006, Max Laier wrote:
>
>> On Saturday 25 November 2006 23:20, Nicolae Namolovan wrote:
>>> But I need to use it on a production server and the CURRENT one is too 
>>> unstable, without too much thinking I just run ping -f 127.0.0.1 and after 
>>> some minutes I got kernel panic, heh.
>> 
>> could you please be more specific about this?  My rather recent current box 
>> is running for over 45min doing "ping -f 127.0.0.1" with no panic or other 
>> ill behavior so far.  After about 10min I disabled the icmp limiting which 
>> obviously didn't trigger it either.  Could you provide a back trace or at 
>> least a panic message?  Thanks.
>
> I haven't seen any problems with ping, but ttcp -u causes the panic in 
> sbdrop_internal() about half the time when the client ttcp is killed by ^C. 
> There is apparently a race in close when packets are arriving. The stack 
> trace on the panicing CPU is (always?):
>
>    ... sigexit exit1 ... closef ... soclose ...
>    sbflush_internal sbdrop_internal panic
>
> and on the other CPU, with net.isr.direct=1 it was:
>
>    bge_rxeof ... netisr_dispatch ip_input ...
>    sbappendaddr_locked mb_ctor_mbuf --- trap (NMI IPI for cpustop).
>
> and with net.isr.direct=0, the other CPU was just running "idle: cpuN" and 
> the bge thread was in ithread_loop.

Historically, sbflush panics have been a sign of a driver<->stack race, in 
which the driver touches the mbuf [chain] after injecting it into the stack, 
corrupting the socket buffer state.  For example, freeing it, appending 
another mbuf, changing the length, etc.  It often triggers in sbflush because 
we notice the inconsistency when we close the socket and flush the buffer 
later.  I wouldn't preclude a network stack bug, but I would definitely take a 
look at the driver in detail first, making sure all error cases are handled 
properly, etc.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061126123700.G2108>