From owner-freebsd-current@FreeBSD.ORG  Mon Nov 30 07:21:03 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E63A21065670;
	Mon, 30 Nov 2009 07:21:03 +0000 (UTC)
	(envelope-from ltning@anduin.net)
Received: from mail.anduin.net (mail.anduin.net [213.225.74.249])
	by mx1.freebsd.org (Postfix) with ESMTP id A051C8FC12;
	Mon, 30 Nov 2009 07:21:03 +0000 (UTC)
Received: from [212.62.248.150] (helo=[192.168.2.110])
	by mail.anduin.net with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.69 (FreeBSD)) (envelope-from <ltning@anduin.net>)
	id 1NF0ZA-000Phg-Lz; Mon, 30 Nov 2009 08:21:00 +0100
Mime-Version: 1.0 (Apple Message framework v1077)
Content-Type: text/plain; charset=iso-8859-1
From: =?iso-8859-1?Q?Eirik_=D8verby?= <ltning@anduin.net>
In-Reply-To: <20091130005236.GC1123@michelle.cdnetworks.com>
Date: Mon, 30 Nov 2009 08:20:57 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <7A7E32A2-9320-4F39-B495-70E547D23B82@anduin.net>
References: <A1648B95-F36D-459D-BBC4-FFCA63FC1E4C@anduin.net>
	<20091129013026.GA1355@michelle.cdnetworks.com>
	<74BFE523-4BB3-4748-98BA-71FBD9829CD5@anduin.net>
	<alpine.BSF.2.00.0911291427240.80654@fledge.watson.org>
	<E9B13DDC-1B51-4EFD-95D2-544238BDF3A4@anduin.net>
	<20091130005236.GC1123@michelle.cdnetworks.com>
To: pyunyh@gmail.com
X-Mailer: Apple Mail (2.1077)
Cc: weldon@excelsusphoto.com, Gavin Atkinson <gavin@freebsd.org>,
	Robert Watson <rwatson@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: FreeBSD 8.0 - network stack crashes?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Nov 2009 07:21:04 -0000

On 30. nov. 2009, at 01.52, Pyun YongHyeon wrote:

> On Mon, Nov 30, 2009 at 12:21:16AM +0100, Eirik ??verby wrote:
>> On 29. nov. 2009, at 15.29, Robert Watson wrote:
>>=20
>>> On Sun, 29 Nov 2009, Eirik =D8verby wrote:
>>>=20
>>>> I just did that (-rxcsum -txcsum -tso), but the numbers still keep =
rising. I'll wait and see if it goes down again, then reboot with those =
values to see how it behaves. But right away it doesn't look too good ..
>>>=20
>>> It would be interesting to know if any of the counters in the output =
of netstat -s grow linearly with the allocation count in netstat -m.  =
Often times leaks are associated with edge cases in the stack (typically =
because if they are in common cases the bug is detected really quickly!) =
-- usually error handling, where in some error case the unwinding fails =
to free an mbuf that it should free.  These are notoriously hard to =
track down, unfortunately, but the stats output (especially where delta =
alloc is linear to delta stat) may inform the situation some more.
>>=20
>> =46rom what I can tell, all that goes up with mbuf usage is =
traffic/packet counts. I can't say I see anything fishy in there.
>>=20
>=20
> If system exhausted all available mbufs it still should not crash
> the box. Use -d option of netstat(1) to see whether packet drop
> counter still goes up when you know system can't receive any
> frames. AFAIK em(4) was carefully written to recover from Rx
> resource shortage such that it just drops incoming frames when it
> can't get new mbuf. This may result in dropping incoming connection
> request but it means it still tries to recover from the resource
> exhaustion.
> It's not clear where mbuf leak comes from, though.

The box does not crash; connecting to the console (via IP-KVM) shows the =
box is just fine, except that no networking works. I can up the =
kern.ipc.nmbclusters value from the commandline, and after a few seconds =
things start moving again.

The em(4) debug output shows that it fails to allocate mbuf clusters.


>> =46rom the last few samples in
>> http://anduin.net/~ltning/netstat.log
>=20
> 404

Uh? Unpossible :)
The file is there, and I can view it here ...


>> you can see the host stops receiving any packets, but does a few =
retransmits before the session where this script ran timed out.
>>=20
>=20
> By chance do you use pf/ipfw/ipf?

No... Unfortunately ;)

/Eirik=