From owner-freebsd-net@FreeBSD.ORG  Tue Feb 16 11:29:56 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8A904106566C
	for <freebsd-net@FreeBSD.org>; Tue, 16 Feb 2010 11:29:56 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A9C58FC14
	for <freebsd-net@FreeBSD.org>; Tue, 16 Feb 2010 11:29:56 +0000 (UTC)
Received: from elsa.codelab.cz (localhost.codelab.cz [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 4A60A19E023;
	Tue, 16 Feb 2010 12:10:39 +0100 (CET)
Received: from [192.168.1.2] (r5bb235.net.upc.cz [86.49.61.235])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTPSA id BCE5719E019;
	Tue, 16 Feb 2010 12:10:36 +0100 (CET)
Message-ID: <4B7A7D2C.9040200@quip.cz>
Date: Tue, 16 Feb 2010 12:10:36 +0100
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.9.1.7) Gecko/20100104 SeaMonkey/2.0.2
MIME-Version: 1.0
To: Maxim Sobolev <sobomax@FreeBSD.org>
References: <4B79297D.9080403@FreeBSD.org> <4B79205B.619A0A1A@verizon.net>
	<4B7A38F5.3090404@FreeBSD.org>
In-Reply-To: <4B7A38F5.3090404@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Alfred Perlstein <alfred@FreeBSD.org>, freebsd-net@FreeBSD.org,
	Sergey Babkin <babkin@verizon.net>,
	"David G. Lawrence" <dg@dglawrence.com>, Jack Vogel <jfvogel@gmail.com>
Subject: Re: Sudden mbuf demand increase and shortage under the load
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Feb 2010 11:29:56 -0000

Maxim Sobolev wrote:
> Sergey Babkin wrote:
>> Maxim Sobolev wrote:
>>> Hi,
>>>
>>> Our company have a FreeBSD based product that consists of the numerous
>>> interconnected processes and it does some high-PPS UDP processing
>>> (30-50K PPS is not uncommon). We are seeing some strange periodic
>>> failures under the load in several such systems, which usually evidences
>>> itself in IPC (even through unix domain sockets) suddenly either
>>> breaking down or pausing and restoring only some time later (like 5-10
>>> minutes). The only sign of failure I managed to find was the increase of
>>> the "requests for mbufs denied" in the netstat -m and number of total
>>> mbuf clusters (nmbclusters) raising up to the limit.
>>
>> As a simple idea: UDP is not flow-controlled. So potentially
>> nothing stops an application from sending the packets as fast as it
>> can. If it's faster than the network card can process,
>> they would start collecting. So this might be worth a try
>> as a way to reproduce the problem and see if the system has
>> a safeguard against it or not.
>>
>> Another possibility: what happens if a process is bound to
>> an UDP socket but doesn't actually read the data from it?
>> FreeBSD used to be pretty good at it, just throwing away
>> the data beyond a certain limit, SVR4 was running out of
>> network memory. But it might have changed, so might be
>> worth a look too.
>
> Thanks. Yes, the latter could be actually the case. The former is less
> likely since the system doesn't generate so much traffic by itself, but
> rather relays what it receives from the network pretty much in 1:1
> ratio. It could happen though, if somehow the output path has been
> stalled. However, netstat -I igb0 shows zero Oerrs, which I guess means
> that we can rule that out too, unless there is some bug in the driver.
>
> So we are looking for potential issues that can cause UDP forwarding
> application to stall and not dequeue packets on time. So far we have
> identified some culprits in application logic that can cause such stalls
> in the unlikely event of gettimeofday() time going backwards. I've seen
> some messages from ntpd around the time of the problem, although it's
> unclear whether those are result of the that mbuf shortage or could
> indicate the root issue. We've also added some debug output to catch any
> abnormalities in the processing times.
>
> In any case I am a little bit surprised on how easy the FreeBSD can let
> mbuf storage to overflow. I'd expect it to be more aggressive in
> dropping things received from network once one application stalls.
> Combined with the fact that we apparently use shared storage for
> different kinds of network activity and perhaps IPC too, this gives an
> easy opportunity for DOS attacks. To me, separate limits for separate
> protocols or even classes of traffic (i.e. local/remote) would make much
> sense.

Can it be related to this issue somehow?

http://lists.freebsd.org/pipermail/freebsd-current/2009-August/011013.html
http://lists.freebsd.org/pipermail/freebsd-current/2009-August/010740.html

It was tested on FreeBSD 8 and high UDP traffic on igb interfaces emits 
messages "GET BUF: dmamap load failure - 12" and later results in kernel 
panic.
We have not received any response to this report.

Miroslav Lachman