From owner-freebsd-stable@FreeBSD.ORG Tue Feb 16 00:26:15 2010 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3B00106566B; Tue, 16 Feb 2010 00:26:15 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id D46F88FC13; Tue, 16 Feb 2010 00:26:15 +0000 (UTC) Received: by elvis.mu.org (Postfix, from userid 1192) id D94A91A3D7C; Mon, 15 Feb 2010 16:08:50 -0800 (PST) Date: Mon, 15 Feb 2010 16:08:50 -0800 From: Alfred Perlstein To: Maxim Sobolev Message-ID: <20100216000850.GC96165@elvis.mu.org> References: <4B793D1D.1000108@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B793D1D.1000108@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: FreeBSD Hackers Subject: Re: Sudden mbuf demand increase and shortage under the load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Feb 2010 00:26:16 -0000 * Maxim Sobolev [100215 04:49] wrote: > Hi, > > Our company have a FreeBSD based product that consists of the numerous > interconnected processes and it does some high-PPS UDP processing > (30-50K PPS is not uncommon). We are seeing some strange periodic > failures under the load in several such systems, which usually evidences > itself in IPC (even through unix domain sockets) suddenly either > breaking down or pausing and restoring only some time later (like 5-10 > minutes). The only sign of failure I managed to find was the increase of > the "requests for mbufs denied" in the netstat -m and number of total > mbuf clusters (nmbclusters) raising up to the limit. Hey Maxim Can you run a process to dump sysctl -a every second or so and mark the time when you did it? Other monitoring things would probably be helpful as well (netstat -m) in a timed log format. vmstat -i? (interrupts storm?) Perhaps ps output (showing interrupt threads, etc) would be good toknow perhaps some ithreads went off into the weeds... Any console messages of note? A few people have suggested that there may be too many packets on the outgoing interface, I think there should be a limit to the number of packets queued for outgoing and probably counters to show how many were dropped due to overflow of the outgoing queue. You should be able to check these counters to see what is going on. If the driver is broken and never drops outgoing packets when the card's queue is full, then those counters should be 0. I hope this helps. -Alfred