Date: Tue, 16 Feb 2010 12:10:36 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> To: Maxim Sobolev <sobomax@FreeBSD.org> Cc: Alfred Perlstein <alfred@FreeBSD.org>, freebsd-net@FreeBSD.org, Sergey Babkin <babkin@verizon.net>, "David G. Lawrence" <dg@dglawrence.com>, Jack Vogel <jfvogel@gmail.com> Subject: Re: Sudden mbuf demand increase and shortage under the load Message-ID: <4B7A7D2C.9040200@quip.cz> In-Reply-To: <4B7A38F5.3090404@FreeBSD.org> References: <4B79297D.9080403@FreeBSD.org> <4B79205B.619A0A1A@verizon.net> <4B7A38F5.3090404@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Maxim Sobolev wrote: > Sergey Babkin wrote: >> Maxim Sobolev wrote: >>> Hi, >>> >>> Our company have a FreeBSD based product that consists of the numerous >>> interconnected processes and it does some high-PPS UDP processing >>> (30-50K PPS is not uncommon). We are seeing some strange periodic >>> failures under the load in several such systems, which usually evidences >>> itself in IPC (even through unix domain sockets) suddenly either >>> breaking down or pausing and restoring only some time later (like 5-10 >>> minutes). The only sign of failure I managed to find was the increase of >>> the "requests for mbufs denied" in the netstat -m and number of total >>> mbuf clusters (nmbclusters) raising up to the limit. >> >> As a simple idea: UDP is not flow-controlled. So potentially >> nothing stops an application from sending the packets as fast as it >> can. If it's faster than the network card can process, >> they would start collecting. So this might be worth a try >> as a way to reproduce the problem and see if the system has >> a safeguard against it or not. >> >> Another possibility: what happens if a process is bound to >> an UDP socket but doesn't actually read the data from it? >> FreeBSD used to be pretty good at it, just throwing away >> the data beyond a certain limit, SVR4 was running out of >> network memory. But it might have changed, so might be >> worth a look too. > > Thanks. Yes, the latter could be actually the case. The former is less > likely since the system doesn't generate so much traffic by itself, but > rather relays what it receives from the network pretty much in 1:1 > ratio. It could happen though, if somehow the output path has been > stalled. However, netstat -I igb0 shows zero Oerrs, which I guess means > that we can rule that out too, unless there is some bug in the driver. > > So we are looking for potential issues that can cause UDP forwarding > application to stall and not dequeue packets on time. So far we have > identified some culprits in application logic that can cause such stalls > in the unlikely event of gettimeofday() time going backwards. I've seen > some messages from ntpd around the time of the problem, although it's > unclear whether those are result of the that mbuf shortage or could > indicate the root issue. We've also added some debug output to catch any > abnormalities in the processing times. > > In any case I am a little bit surprised on how easy the FreeBSD can let > mbuf storage to overflow. I'd expect it to be more aggressive in > dropping things received from network once one application stalls. > Combined with the fact that we apparently use shared storage for > different kinds of network activity and perhaps IPC too, this gives an > easy opportunity for DOS attacks. To me, separate limits for separate > protocols or even classes of traffic (i.e. local/remote) would make much > sense. Can it be related to this issue somehow? http://lists.freebsd.org/pipermail/freebsd-current/2009-August/011013.html http://lists.freebsd.org/pipermail/freebsd-current/2009-August/010740.html It was tested on FreeBSD 8 and high UDP traffic on igb interfaces emits messages "GET BUF: dmamap load failure - 12" and later results in kernel panic. We have not received any response to this report. Miroslav Lachman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B7A7D2C.9040200>