From owner-freebsd-stable@FreeBSD.ORG  Tue Feb 16 00:26:15 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E3B00106566B;
	Tue, 16 Feb 2010 00:26:15 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.freebsd.org (Postfix) with ESMTP id D46F88FC13;
	Tue, 16 Feb 2010 00:26:15 +0000 (UTC)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id D94A91A3D7C; Mon, 15 Feb 2010 16:08:50 -0800 (PST)
Date: Mon, 15 Feb 2010 16:08:50 -0800
From: Alfred Perlstein <alfred@freebsd.org>
To: Maxim Sobolev <sobomax@FreeBSD.org>
Message-ID: <20100216000850.GC96165@elvis.mu.org>
References: <4B793D1D.1000108@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4B793D1D.1000108@FreeBSD.org>
User-Agent: Mutt/1.4.2.3i
Cc: FreeBSD Hackers <freebsd-stable@FreeBSD.org>
Subject: Re: Sudden mbuf demand increase and shortage under the load
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Feb 2010 00:26:16 -0000

* Maxim Sobolev <sobomax@FreeBSD.org> [100215 04:49] wrote:
> Hi,
> 
> Our company have a FreeBSD based product that consists of the numerous
> interconnected processes and it does some high-PPS UDP processing
> (30-50K PPS is not uncommon). We are seeing some strange periodic
> failures under the load in several such systems, which usually evidences
> itself in IPC (even through unix domain sockets) suddenly either
> breaking down or pausing and restoring only some time later (like 5-10
> minutes). The only sign of failure I managed to find was the increase of
> the "requests for mbufs denied" in the netstat -m and number of total
> mbuf clusters (nmbclusters) raising up to the limit.

Hey Maxim

Can you run a process to dump sysctl -a every second or so
and mark the time when you did it?

Other monitoring things would probably be helpful as well (netstat -m)
in a timed log format.  vmstat -i?  (interrupts storm?)

Perhaps ps output (showing interrupt threads, etc) would be good toknow
perhaps some ithreads went off into the weeds...

Any console messages of note?

A few people have suggested that there may be too many packets
on the outgoing interface, I think there should be a limit to the
number of packets queued for outgoing and probably counters to
show how many were dropped due to overflow of the outgoing queue.

You should be able to check these counters to see what is going
on. If the driver is broken and never drops outgoing packets when
the card's queue is full, then those counters should be 0.

I hope this helps.

-Alfred