FreeBSD Mail Archives

Date:      Tue, 16 Feb 2010 10:53:41 -0800
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        Maxim Sobolev <sobomax@freebsd.org>
Cc:        Sergey Babkin <babkin@verizon.net>, freebsd-net@freebsd.org, Alfred Perlstein <alfred@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, "David G. Lawrence" <dg@dglawrence.com>
Subject:   Re: Sudden mbuf demand increase and shortage under the load (igb issue?)
Message-ID:  <20100216185341.GE1394@michelle.cdnetworks.com>
In-Reply-To: <4B7ADFC6.7020202@FreeBSD.org>
References:  <4B79297D.9080403@FreeBSD.org> <4B79205B.619A0A1A@verizon.net> <4B7ADFC6.7020202@FreeBSD.org>

On Tue, Feb 16, 2010 at 10:11:18AM -0800, Maxim Sobolev wrote:
> OK, here is some new data that I think rules out any issues with the 
> applications. Following Alfred's suggestion I have made a script to run 
> every second and output some system statistics:
> 
> date
> netstat -m
> vmstat -i
> ps -axl
> pstat -T
> vmstat -z
> sysctl -a
> 
> The problem had hit us again today several times and upon investigating 
> the log I found that increase in the mbuf usage happened in one step - 
> going from normal 10% to 100% between two script runs. What is more 
> interesting, is that time from two such subsequent runs were about 2 
> minutes apart (instead of 1 second as it should be) and when inspecting 
> cron logs I noticed the same time gap in there. I ruled out any VM 
> starvation as a cause of the delay because system has plenty of free 
> memory. The incoming network traffic was not sufficient to starve VM so 
> quickly either - it was about 7MB/sec at that time, so even if all 
> receivers stopped draining their buffers it should have taken at least 
> 1-2 seconds to fill up mbuf cache and create demand for an additional 
> kernel memory. The failure would likely to be more gradual and I should 
> have seen how it builds up in the debug log.
> 
> So it looks like kernel issue of a sort, which causes all userland 
> activity to cease for 2 minutes when the system reaches certain load. 
> Mbuf build-up is only the by-product of this, not really a cause. igb(4) 
> is being the primary suspect now, since we have other machines with more 
> load not having this problem and we don't have anybody else using this 
> driver.  The chip is the following:
> 
> igb0@pci0:5:0:0:        class=0x020000 card=0x323f103c chip=0x10c98086 
> rev=0x01 hdr=0x00
>     vendor     = 'Intel Corporation'
>     class      = network
>     subclass   = ethernet
> igb1@pci0:5:0:1:        class=0x020000 card=0x323f103c chip=0x10c98086 
> rev=0x01 hdr=0x00
>     vendor     = 'Intel Corporation'
>     class      = network
>     subclass   = ethernet
> 
> Hardware in question is a new HP DL160G6. I have also checked IPMI logs 
> and sensors and have not found any issue in there as well. No sensors 
> reported off-range values and chassis temperature is within normal limits.
> 
> I am not sure how to debug this problem further. We are now 
> investigating opportunity to install external non-igb card to the server 
> and see if it solves the issue.
> 

I know there were a couple of igb(4) issues in 7.x but it seems
recent fixes didn't make it into stable/7 and 7.3-RELEASE. But the
issues I know does not explain the symptom of your issue. One of
issues that could be related with the issue was igb(4) took a lot
of CPU cycles as it incorrectly rescheduled to get more frames
instead of dropping some frames when heavy UDP traffic hits the
controller(e.g. 64bytes UDP torture test).

> I have the whole log if anyone wants to take a closer peek.
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100216185341.GE1394>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation