Date: Mon, 15 Feb 2010 11:32:28 -0800 From: Jack Vogel <jfvogel@gmail.com> To: Maxim Sobolev <sobomax@freebsd.org> Cc: freebsd-net@freebsd.org, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: Sudden mbuf demand increase and shortage under the load Message-ID: <2a41acea1002151132p3e58d4bu7adbbed527d5a81f@mail.gmail.com> In-Reply-To: <4B79297D.9080403@FreeBSD.org> References: <4B79297D.9080403@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Can you tell me more about the system with the problem, does it have both em and igb driven interfaces? Jack 2010/2/15 Maxim Sobolev <sobomax@freebsd.org> > Hi, > > Our company have a FreeBSD based product that consists of the numerous > interconnected processes and it does some high-PPS UDP processing (30-50K > PPS is not uncommon). We are seeing some strange periodic failures under the > load in several such systems, which usually evidences itself in IPC (even > through unix domain sockets) suddenly either breaking down or pausing and > restoring only some time later (like 5-10 minutes). The only sign of failure > I managed to find was the increase of the "requests for mbufs denied" in the > netstat -m and number of total mbuf clusters (nmbclusters) raising up to the > limit. > > I have tried to raise some network-related limits (most notably maxusers > and nmbclusters), but it has not helped with the issue - it's still > happening from time to time to us. Below you can find output from the > netstat -m few minutes right after that shortage period - you see that > somehow the system has allocated huge amount of memory for the network > (700MB), with only tiny amount of that being actually in use. This is for > the kern.ipc.nmbclusters: 302400. Eventually the system reclaims all that > memory and goes back to its normal use of 30-70MB. > > This problem is killing us, so any suggestions are greatly appreciated. My > current hypothesis is that due to some issues either with the network driver > or network subsystem itself, the system goes insane and "eats" up all mbufs > up to nmbclusters limit. But since mbufs are shared between network and > local IPC, IPC goes down as well. > > We observe this issue with systems using both em(4) driver and igb(4) > driver. I believe both drivers share the same design, however I am not sure > if this is some kind of design flaw in the driver or part of a larger > problem with the network subsystem. > > This happens on amd64 7.2-RELEASE and 7.3-PRERELEASE alike, with 8GB of > memory. I have not tried upgrading to 8.0, this is production system so > upgrading will not be easy. I don't believe there are some differences that > let us hope that this problem will go away after upgrade, but I can try it > as the last resort. > > As I said, this is very critical issue, so I can provide any additional > debug information upon request. We are ready to go as far as paying somebody > reasonable amount of money for tracking down and resolving the issue. > > Regards, > -- > Maksym Sobolyev > Sippy Software, Inc. > Internet Telephony (VoIP) Experts > T/F: +1-646-651-1110 > Web: http://www.sippysoft.com > MSN: sales@sippysoft.com > Skype: SippySoft > > > [ssp-root@ds-467 /usr/src]$ netstat -m > 17061/417669/434730 mbufs in use (current/cache/total) > 10420/291980/302400/302400 mbuf clusters in use (current/cache/total/max) > 10420/0 mbuf+clusters out of packet secondary zone in use (current/cache) > 19/1262/1281/51200 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/25600 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/12800 16k jumbo clusters in use (current/cache/total/max) > 25181K/693425K/718606K bytes allocated to network (current/cache/total) > 1246681/129567494/67681640 requests for mbufs denied > (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > [FEW MINUTES LATER] > > [ssp-root@ds-467 /usr/src]$ netstat -m > 10001/84574/94575 mbufs in use (current/cache/total) > 6899/6931/13830/302400 mbuf clusters in use (current/cache/total/max) > 6899/6267 mbuf+clusters out of packet secondary zone in use (current/cache) > 2/1151/1153/51200 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/25600 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/12800 16k jumbo clusters in use (current/cache/total/max) > 16306K/39609K/55915K bytes allocated to network (current/cache/total) > 1246681/129567494/67681640 requests for mbufs denied > (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a41acea1002151132p3e58d4bu7adbbed527d5a81f>