From owner-freebsd-net@FreeBSD.ORG Tue Feb 16 20:13:00 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C2A371065672; Tue, 16 Feb 2010 20:13:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 518DE8FC15; Tue, 16 Feb 2010 20:12:59 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o1GJwt3I003727 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 16 Feb 2010 21:58:55 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id o1GJwtKV027562; Tue, 16 Feb 2010 21:58:55 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id o1GJwtoa027561; Tue, 16 Feb 2010 21:58:55 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 16 Feb 2010 21:58:55 +0200 From: Kostik Belousov To: Maxim Sobolev Message-ID: <20100216195855.GG50403@deviant.kiev.zoral.com.ua> References: <4B79297D.9080403@FreeBSD.org> <4B79205B.619A0A1A@verizon.net> <4B7ADFC6.7020202@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Bqc0IY4JZZt50bUr" Content-Disposition: inline In-Reply-To: <4B7ADFC6.7020202@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-net@freebsd.org, FreeBSD Hackers Subject: Re: Sudden mbuf demand increase and shortage under the load (igb issue?) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Feb 2010 20:13:00 -0000 --Bqc0IY4JZZt50bUr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [Trimmed Cc: list] On Tue, Feb 16, 2010 at 10:11:18AM -0800, Maxim Sobolev wrote: > OK, here is some new data that I think rules out any issues with the=20 > applications. Following Alfred's suggestion I have made a script to run= =20 > every second and output some system statistics: >=20 > date > netstat -m > vmstat -i > ps -axl > pstat -T > vmstat -z > sysctl -a >=20 > The problem had hit us again today several times and upon investigating= =20 > the log I found that increase in the mbuf usage happened in one step -=20 > going from normal 10% to 100% between two script runs. What is more=20 > interesting, is that time from two such subsequent runs were about 2=20 > minutes apart (instead of 1 second as it should be) and when inspecting= =20 > cron logs I noticed the same time gap in there. I ruled out any VM=20 > starvation as a cause of the delay because system has plenty of free=20 > memory. The incoming network traffic was not sufficient to starve VM so= =20 > quickly either - it was about 7MB/sec at that time, so even if all=20 > receivers stopped draining their buffers it should have taken at least=20 > 1-2 seconds to fill up mbuf cache and create demand for an additional=20 > kernel memory. The failure would likely to be more gradual and I should= =20 > have seen how it builds up in the debug log. >=20 > So it looks like kernel issue of a sort, which causes all userland=20 > activity to cease for 2 minutes when the system reaches certain load.=20 > Mbuf build-up is only the by-product of this, not really a cause. igb(4)= =20 > is being the primary suspect now, since we have other machines with more= =20 > load not having this problem and we don't have anybody else using this=20 > driver. The chip is the following: >=20 > igb0@pci0:5:0:0: class=3D0x020000 card=3D0x323f103c chip=3D0x10c98= 086=20 > rev=3D0x01 hdr=3D0x00 > vendor =3D 'Intel Corporation' > class =3D network > subclass =3D ethernet > igb1@pci0:5:0:1: class=3D0x020000 card=3D0x323f103c chip=3D0x10c98= 086=20 > rev=3D0x01 hdr=3D0x00 > vendor =3D 'Intel Corporation' > class =3D network > subclass =3D ethernet >=20 > Hardware in question is a new HP DL160G6. I have also checked IPMI logs= =20 > and sensors and have not found any issue in there as well. No sensors=20 > reported off-range values and chassis temperature is within normal limits. >=20 > I am not sure how to debug this problem further. We are now=20 > investigating opportunity to install external non-igb card to the server= =20 > and see if it solves the issue. >=20 > I have the whole log if anyone wants to take a closer peek. How much physical memory do you have installed in the machine ? If it is > 16Gb, try to remove some. --Bqc0IY4JZZt50bUr Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkt6+P4ACgkQC3+MBN1Mb4gn+QCgvaSwNrcvigYcLCXLwV81i8j/ mzYAoNghlDps8yyiQieR1r9ejiPpnkKx =9c1c -----END PGP SIGNATURE----- --Bqc0IY4JZZt50bUr--