From owner-freebsd-net@FreeBSD.ORG Tue Feb 16 18:54:25 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 73C65106566B; Tue, 16 Feb 2010 18:54:25 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-qy0-f173.google.com (mail-qy0-f173.google.com [209.85.221.173]) by mx1.freebsd.org (Postfix) with ESMTP id D46EB8FC17; Tue, 16 Feb 2010 18:54:24 +0000 (UTC) Received: by qyk4 with SMTP id 4so3869244qyk.8 for ; Tue, 16 Feb 2010 10:54:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=eGqe6sls0r5N+1WduLvK0Ddg5D6lF51v1ED33UHP8rw=; b=eKKYoVFlAZvXJjbUxtcMMO1HBthocSEcQSCvHmIV9bX/wgrc6YxxNfK77OdeQFF/6R bgRvHfTDSGYIEtnP5rowhhb7pEBR8RKPcGeTfEbPd9/dChmShBbOQdjqArmuV/Ww2YHd N4K4ygKTjiPf41hO0X6aCPGBK3pDYJlOL356k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=bfl1XIQjlmjHZ3RzoN5QWBsfgx2G+1+hvm3Ai3jmXviD9pongetVJnWBl3U/wEUIl1 8Bd066MdhDuH/OvaNdtElml9rNO+xwNDBCftZ/heob7oMRjJtxF0TK2BWO9yQVJF7Nh4 /N5FdpbNlTeDaWQY6JVvGgPN+UHM8IkEPe1gs= Received: by 10.224.107.77 with SMTP id a13mr955235qap.312.1266346463767; Tue, 16 Feb 2010 10:54:23 -0800 (PST) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id 4sm19427854qwe.33.2010.02.16.10.54.21 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 16 Feb 2010 10:54:22 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Tue, 16 Feb 2010 10:53:41 -0800 From: Pyun YongHyeon Date: Tue, 16 Feb 2010 10:53:41 -0800 To: Maxim Sobolev Message-ID: <20100216185341.GE1394@michelle.cdnetworks.com> References: <4B79297D.9080403@FreeBSD.org> <4B79205B.619A0A1A@verizon.net> <4B7ADFC6.7020202@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B7ADFC6.7020202@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: Sergey Babkin , freebsd-net@freebsd.org, Alfred Perlstein , Jack Vogel , FreeBSD Hackers , "David G. Lawrence" Subject: Re: Sudden mbuf demand increase and shortage under the load (igb issue?) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Feb 2010 18:54:25 -0000 On Tue, Feb 16, 2010 at 10:11:18AM -0800, Maxim Sobolev wrote: > OK, here is some new data that I think rules out any issues with the > applications. Following Alfred's suggestion I have made a script to run > every second and output some system statistics: > > date > netstat -m > vmstat -i > ps -axl > pstat -T > vmstat -z > sysctl -a > > The problem had hit us again today several times and upon investigating > the log I found that increase in the mbuf usage happened in one step - > going from normal 10% to 100% between two script runs. What is more > interesting, is that time from two such subsequent runs were about 2 > minutes apart (instead of 1 second as it should be) and when inspecting > cron logs I noticed the same time gap in there. I ruled out any VM > starvation as a cause of the delay because system has plenty of free > memory. The incoming network traffic was not sufficient to starve VM so > quickly either - it was about 7MB/sec at that time, so even if all > receivers stopped draining their buffers it should have taken at least > 1-2 seconds to fill up mbuf cache and create demand for an additional > kernel memory. The failure would likely to be more gradual and I should > have seen how it builds up in the debug log. > > So it looks like kernel issue of a sort, which causes all userland > activity to cease for 2 minutes when the system reaches certain load. > Mbuf build-up is only the by-product of this, not really a cause. igb(4) > is being the primary suspect now, since we have other machines with more > load not having this problem and we don't have anybody else using this > driver. The chip is the following: > > igb0@pci0:5:0:0: class=0x020000 card=0x323f103c chip=0x10c98086 > rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > class = network > subclass = ethernet > igb1@pci0:5:0:1: class=0x020000 card=0x323f103c chip=0x10c98086 > rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > class = network > subclass = ethernet > > Hardware in question is a new HP DL160G6. I have also checked IPMI logs > and sensors and have not found any issue in there as well. No sensors > reported off-range values and chassis temperature is within normal limits. > > I am not sure how to debug this problem further. We are now > investigating opportunity to install external non-igb card to the server > and see if it solves the issue. > I know there were a couple of igb(4) issues in 7.x but it seems recent fixes didn't make it into stable/7 and 7.3-RELEASE. But the issues I know does not explain the symptom of your issue. One of issues that could be related with the issue was igb(4) took a lot of CPU cycles as it incorrectly rescheduled to get more frames instead of dropping some frames when heavy UDP traffic hits the controller(e.g. 64bytes UDP torture test). > I have the whole log if anyone wants to take a closer peek. >