From owner-freebsd-net@FreeBSD.ORG  Mon Feb 15 19:32:30 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 56650106566B;
	Mon, 15 Feb 2010 19:32:30 +0000 (UTC)
	(envelope-from jfvogel@gmail.com)
Received: from ey-out-2122.google.com (ey-out-2122.google.com [74.125.78.24])
	by mx1.freebsd.org (Postfix) with ESMTP id B00008FC1C;
	Mon, 15 Feb 2010 19:32:29 +0000 (UTC)
Received: by ey-out-2122.google.com with SMTP id 9so595356eyd.9
	for <multiple recipients>; Mon, 15 Feb 2010 11:32:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type;
	bh=CTd6o3RQPxGpvVMfATWSBaTca+9Zv0Z8mH4owPlV2h0=;
	b=kmSp2yg2gAVUfXFbQFkxMRyd8ffwPwzxzctoesfluQLcyecnMrryLb/30agU1pz72r
	VO3OVMIBXCH6ItldTrz8Js9uzXZAcwbJoFiu2YuQfsuKjuL0+jogkW3NlSC2T+oB/Nyu
	4mjKiBobFLBHVr/VTRoaRtBJsK4jQeKb2qfAQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=qU6c6VlPFnfT0Xeg1qtWZIuSevoYitPVergJHMwaVT5JVoDk1UnHgBsed84HHTAGEA
	hu73eT5LeYqRXYSq/Dt/xLxGWtRyv8wn/vAUkf0JxzYNkLqt/MuHw+vIo0r1oTQEGSJC
	dBTEz2CuSMX7d6lvQYA12M4vUG0lz4A6KY65s=
MIME-Version: 1.0
Received: by 10.216.157.1 with SMTP id n1mr3450290wek.188.1266262348281; Mon, 
	15 Feb 2010 11:32:28 -0800 (PST)
In-Reply-To: <4B79297D.9080403@FreeBSD.org>
References: <4B79297D.9080403@FreeBSD.org>
Date: Mon, 15 Feb 2010 11:32:28 -0800
Message-ID: <2a41acea1002151132p3e58d4bu7adbbed527d5a81f@mail.gmail.com>
From: Jack Vogel <jfvogel@gmail.com>
To: Maxim Sobolev <sobomax@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-net@freebsd.org, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject: Re: Sudden mbuf demand increase and shortage under the load
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Feb 2010 19:32:30 -0000

Can you tell me more about the system with the problem, does it have both em
and igb driven
interfaces?

Jack


2010/2/15 Maxim Sobolev <sobomax@freebsd.org>

> Hi,
>
> Our company have a FreeBSD based product that consists of the numerous
> interconnected processes and it does some high-PPS UDP processing (30-50K
> PPS is not uncommon). We are seeing some strange periodic failures under the
> load in several such systems, which usually evidences itself in IPC (even
> through unix domain sockets) suddenly either breaking down or pausing and
> restoring only some time later (like 5-10 minutes). The only sign of failure
> I managed to find was the increase of the "requests for mbufs denied" in the
> netstat -m and number of total mbuf clusters (nmbclusters) raising up to the
> limit.
>
> I have tried to raise some network-related limits (most notably maxusers
> and nmbclusters), but it has not helped with the issue - it's still
> happening from time to time to us. Below you can find output from the
> netstat -m few minutes right after that shortage period - you see that
> somehow the system has allocated huge amount of memory for the network
> (700MB), with only tiny amount of that being actually in use. This is for
> the kern.ipc.nmbclusters: 302400. Eventually the system reclaims all that
> memory and goes back to its normal use of 30-70MB.
>
> This problem is killing us, so any suggestions are greatly appreciated. My
> current hypothesis is that due to some issues either with the network driver
> or network subsystem itself, the system goes insane and "eats" up all mbufs
> up to nmbclusters limit. But since mbufs are shared between network and
> local IPC, IPC goes down as well.
>
> We observe this issue with systems using both em(4) driver and igb(4)
> driver. I believe both drivers share the same design, however I am not sure
> if this is some kind of design flaw in the driver or part of a larger
> problem with the network subsystem.
>
> This happens on amd64 7.2-RELEASE and 7.3-PRERELEASE alike, with 8GB of
> memory. I have not tried upgrading to 8.0, this is production system so
> upgrading will not be easy.  I don't believe there are some differences that
> let us hope that this problem will go away after upgrade, but I can try it
> as the last resort.
>
> As I said, this is very critical issue, so I can provide any additional
> debug information upon request. We are ready to go as far as paying somebody
> reasonable amount of money for tracking down and resolving the issue.
>
> Regards,
> --
> Maksym Sobolyev
> Sippy Software, Inc.
> Internet Telephony (VoIP) Experts
> T/F: +1-646-651-1110
> Web: http://www.sippysoft.com
> MSN: sales@sippysoft.com
> Skype: SippySoft
>
>
> [ssp-root@ds-467 /usr/src]$ netstat -m
> 17061/417669/434730 mbufs in use (current/cache/total)
> 10420/291980/302400/302400 mbuf clusters in use (current/cache/total/max)
> 10420/0 mbuf+clusters out of packet secondary zone in use (current/cache)
> 19/1262/1281/51200 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/25600 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/12800 16k jumbo clusters in use (current/cache/total/max)
> 25181K/693425K/718606K bytes allocated to network (current/cache/total)
> 1246681/129567494/67681640 requests for mbufs denied
> (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>
> [FEW MINUTES LATER]
>
> [ssp-root@ds-467 /usr/src]$ netstat -m
> 10001/84574/94575 mbufs in use (current/cache/total)
> 6899/6931/13830/302400 mbuf clusters in use (current/cache/total/max)
> 6899/6267 mbuf+clusters out of packet secondary zone in use (current/cache)
> 2/1151/1153/51200 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/25600 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/12800 16k jumbo clusters in use (current/cache/total/max)
> 16306K/39609K/55915K bytes allocated to network (current/cache/total)
> 1246681/129567494/67681640 requests for mbufs denied
> (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>