From owner-freebsd-net@FreeBSD.ORG Tue Feb 16 11:29:56 2010 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A904106566C for ; Tue, 16 Feb 2010 11:29:56 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 1A9C58FC14 for ; Tue, 16 Feb 2010 11:29:56 +0000 (UTC) Received: from elsa.codelab.cz (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 4A60A19E023; Tue, 16 Feb 2010 12:10:39 +0100 (CET) Received: from [192.168.1.2] (r5bb235.net.upc.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id BCE5719E019; Tue, 16 Feb 2010 12:10:36 +0100 (CET) Message-ID: <4B7A7D2C.9040200@quip.cz> Date: Tue, 16 Feb 2010 12:10:36 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.7) Gecko/20100104 SeaMonkey/2.0.2 MIME-Version: 1.0 To: Maxim Sobolev References: <4B79297D.9080403@FreeBSD.org> <4B79205B.619A0A1A@verizon.net> <4B7A38F5.3090404@FreeBSD.org> In-Reply-To: <4B7A38F5.3090404@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alfred Perlstein , freebsd-net@FreeBSD.org, Sergey Babkin , "David G. Lawrence" , Jack Vogel Subject: Re: Sudden mbuf demand increase and shortage under the load X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Feb 2010 11:29:56 -0000 Maxim Sobolev wrote: > Sergey Babkin wrote: >> Maxim Sobolev wrote: >>> Hi, >>> >>> Our company have a FreeBSD based product that consists of the numerous >>> interconnected processes and it does some high-PPS UDP processing >>> (30-50K PPS is not uncommon). We are seeing some strange periodic >>> failures under the load in several such systems, which usually evidences >>> itself in IPC (even through unix domain sockets) suddenly either >>> breaking down or pausing and restoring only some time later (like 5-10 >>> minutes). The only sign of failure I managed to find was the increase of >>> the "requests for mbufs denied" in the netstat -m and number of total >>> mbuf clusters (nmbclusters) raising up to the limit. >> >> As a simple idea: UDP is not flow-controlled. So potentially >> nothing stops an application from sending the packets as fast as it >> can. If it's faster than the network card can process, >> they would start collecting. So this might be worth a try >> as a way to reproduce the problem and see if the system has >> a safeguard against it or not. >> >> Another possibility: what happens if a process is bound to >> an UDP socket but doesn't actually read the data from it? >> FreeBSD used to be pretty good at it, just throwing away >> the data beyond a certain limit, SVR4 was running out of >> network memory. But it might have changed, so might be >> worth a look too. > > Thanks. Yes, the latter could be actually the case. The former is less > likely since the system doesn't generate so much traffic by itself, but > rather relays what it receives from the network pretty much in 1:1 > ratio. It could happen though, if somehow the output path has been > stalled. However, netstat -I igb0 shows zero Oerrs, which I guess means > that we can rule that out too, unless there is some bug in the driver. > > So we are looking for potential issues that can cause UDP forwarding > application to stall and not dequeue packets on time. So far we have > identified some culprits in application logic that can cause such stalls > in the unlikely event of gettimeofday() time going backwards. I've seen > some messages from ntpd around the time of the problem, although it's > unclear whether those are result of the that mbuf shortage or could > indicate the root issue. We've also added some debug output to catch any > abnormalities in the processing times. > > In any case I am a little bit surprised on how easy the FreeBSD can let > mbuf storage to overflow. I'd expect it to be more aggressive in > dropping things received from network once one application stalls. > Combined with the fact that we apparently use shared storage for > different kinds of network activity and perhaps IPC too, this gives an > easy opportunity for DOS attacks. To me, separate limits for separate > protocols or even classes of traffic (i.e. local/remote) would make much > sense. Can it be related to this issue somehow? http://lists.freebsd.org/pipermail/freebsd-current/2009-August/011013.html http://lists.freebsd.org/pipermail/freebsd-current/2009-August/010740.html It was tested on FreeBSD 8 and high UDP traffic on igb interfaces emits messages "GET BUF: dmamap load failure - 12" and later results in kernel panic. We have not received any response to this report. Miroslav Lachman