From owner-freebsd-stable@FreeBSD.ORG Mon Feb 15 13:05:24 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 814D51065672 for ; Mon, 15 Feb 2010 13:05:24 +0000 (UTC) (envelope-from freebsd-stable@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 066058FC0A for ; Mon, 15 Feb 2010 13:05:23 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Nh0dc-000550-Ph for freebsd-stable@freebsd.org; Mon, 15 Feb 2010 14:05:20 +0100 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 15 Feb 2010 14:05:20 +0100 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 15 Feb 2010 14:05:20 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-stable@freebsd.org From: Ivan Voras Date: Mon, 15 Feb 2010 14:05:02 +0100 Lines: 54 Message-ID: References: <4B793D1D.1000108@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.5) Gecko/20100118 Thunderbird/3.0 In-Reply-To: <4B793D1D.1000108@FreeBSD.org> Sender: news Cc: freebsd-net@freebsd.org Subject: Re: Sudden mbuf demand increase and shortage under the load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Feb 2010 13:05:24 -0000 On 02/15/10 13:25, Maxim Sobolev wrote: > Hi, > > Our company have a FreeBSD based product that consists of the numerous > interconnected processes and it does some high-PPS UDP processing > (30-50K PPS is not uncommon). We are seeing some strange periodic I have nothing very useful to help you with but maybe you can detect if it's a em/igp issue by buying a cheap Realtek gigabit (re) card and trying it out. Those can be bought for a few dollars now (e.g. from D-Link and many others), and I can confirm that at least the one I tried can carry around 50K pps, but not much more (I can tell you the exact chip later today if you are interested). > failures under the load in several such systems, which usually evidences > itself in IPC (even through unix domain sockets) suddenly either > breaking down or pausing and restoring only some time later (like 5-10 > minutes). The only sign of failure I managed to find was the increase of > the "requests for mbufs denied" in the netstat -m and number of total > mbuf clusters (nmbclusters) raising up to the limit. > > I have tried to raise some network-related limits (most notably maxusers > and nmbclusters), but it has not helped with the issue - it's still > happening from time to time to us. Below you can find output from the > netstat -m few minutes right after that shortage period - you see that > somehow the system has allocated huge amount of memory for the network > (700MB), with only tiny amount of that being actually in use. This is > for the kern.ipc.nmbclusters: 302400. Eventually the system reclaims all > that memory and goes back to its normal use of 30-70MB. > > This problem is killing us, so any suggestions are greatly appreciated. > My current hypothesis is that due to some issues either with the network > driver or network subsystem itself, the system goes insane and "eats" up > all mbufs up to nmbclusters limit. But since mbufs are shared between > network and local IPC, IPC goes down as well. > > We observe this issue with systems using both em(4) driver and igb(4) > driver. I believe both drivers share the same design, however I am not > sure if this is some kind of design flaw in the driver or part of a > larger problem with the network subsystem. > > This happens on amd64 7.2-RELEASE and 7.3-PRERELEASE alike, with 8GB of > memory. I have not tried upgrading to 8.0, this is production system so > upgrading will not be easy. I don't believe there are some differences > that let us hope that this problem will go away after upgrade, but I can > try it as the last resort. > > As I said, this is very critical issue, so I can provide any additional > debug information upon request. We are ready to go as far as paying > somebody reasonable amount of money for tracking down and resolving the > issue. > > Regards,