From owner-freebsd-net@FreeBSD.ORG Sat Mar 2 14:28:13 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2F70AD2E for ; Sat, 2 Mar 2013 14:28:13 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm21.bullet.mail.ne1.yahoo.com (nm21.bullet.mail.ne1.yahoo.com [98.138.90.84]) by mx1.freebsd.org (Postfix) with ESMTP id EB5E0DF4 for ; Sat, 2 Mar 2013 14:28:12 +0000 (UTC) Received: from [98.138.90.51] by nm21.bullet.mail.ne1.yahoo.com with NNFMP; 02 Mar 2013 14:24:55 -0000 Received: from [98.138.89.195] by tm4.bullet.mail.ne1.yahoo.com with NNFMP; 02 Mar 2013 14:24:55 -0000 Received: from [127.0.0.1] by omp1053.mail.ne1.yahoo.com with NNFMP; 02 Mar 2013 14:24:55 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 434675.77859.bm@omp1053.mail.ne1.yahoo.com Received: (qmail 55959 invoked by uid 60001); 2 Mar 2013 14:24:55 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1362234294; bh=z/yNbsIJnDO5uB4iQSPDjlJGVtWpfeghoM3k2/cWJWY=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=rpF1FaC1tc2NNdsZENdX6VSwCTZ1fGhoXQh4RUWa2Amit5foAAVdbLy1OUtwydwklDL+XPdcS6TgoBgWppaCWab8/iQ7Hu4mNtAIuPn1YbH066iJ06DN7tHWYyEk5Y2PjswKYB4lI/sD5fq05UoaolbJy4/HTMfQ6elzTECa7d0= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=v4E1PadyK8BLyW965uJgNReAhyBpRjJFeORMC3kZnMvPk3qYyHHCyMY/WFk/yTc1rqmK7ipNheGtYpBjDtx8XM+WdxpKjHhlzZyw37ndrbzTKB20Gs5I6osUw3bQpCShJkGGHo9N9apSe6eJlCitiRIIZVflvpUY2O1fX96y36U=; X-YMail-OSG: scGalDEVM1n8rjL_.RV4L9oDa3IlDLNTFeo5M2vc_Lg88w0 _0QvbujD0KoBX6c73EXH4QtavYDvLUA2jM6Sv_tp0otPOhy8qIdE41tsuO5c 2VxWapmCdmCJ.mFtwK3MACd0oonGhX7hyNfUnL4SRWNkNbxJVqL8uSYFYOJ0 kS4EYqTbFhlVQjHMnw_Sqvzl.CuxRMUfm56sEmWj0DFrlqeOHM7_r0f_QUN7 Ft7t9hKn36ilem_v_MiGwTbpUsPY8UNNgVe2fKbACtR2jyePCsZhZlgV4JUN COFsZQSr.QU1JkXdweQK4KUBDwosIASy2svHjB5EQ.jPFekBo2C5nD4WARmE 9BibtYJp86f30vWjj.DMBGsmh6xNEkjfdaUcVpGrZrg6UZBC9KBYcMELSMSd MgCdA321EZLMYcqomaApjPyQBIffIkAVfj33OxNINPumjEc.cnshZo9MtJxP 2s1emBTq6IHBZnzv8aiQ_St.XZVVRIwUgZSL48puZuzS_0k6MCF5FtMIMWNN rPaSSNkIHWo3m7b0syMF9IhEI99ULsE1T0T9FlNMKByiaAcuKn9UFPwHMASf t369ipyTtBCyxcS04C0vKrvIuaqujVayWEhzW6fzGP4VthYSXOTJj7.8MTOr cGT71Jr2Nhgo_py7qBc_fvM8DuMqfx4cjgWTIyb1ZeORn9xTP9HcJnV3PEPu zzmcwfu2cRlZuvKV3Q6CNWbio8pamLqVgof9_vY2g85M.IEANQB1WIw-- Received: from [174.48.128.27] by web121603.mail.ne1.yahoo.com via HTTP; Sat, 02 Mar 2013 06:24:54 PST X-Rocket-MIMEInfo: 001.001, DQoNCi0tLSBPbiBNb24sIDIvMjUvMTMsIENocmlzdG9waGVyIEQuIEhhcnJpc29uIDxoYXJyaXNvbkBiaW9zdGF0Lndpc2MuZWR1PiB3cm90ZToNCg0KPiBGcm9tOiBDaHJpc3RvcGhlciBELiBIYXJyaXNvbiA8aGFycmlzb25AYmlvc3RhdC53aXNjLmVkdT4NCj4gU3ViamVjdDogUmU6IGlnYiBuZXR3b3JrIGxvY2t1cHMNCj4gVG86ICJKYWNrIFZvZ2VsIiA8amZ2b2dlbEBnbWFpbC5jb20.DQo.IENjOiBmcmVlYnNkLW5ldEBmcmVlYnNkLm9yZw0KPiBEYXRlOiBNb25kYXksIEZlYnJ1YXJ5IDI1LCAyMDEzLCABMAEBAQE- X-Mailer: YahooMailClassic/15.1.4 YahooMailWebService/0.8.135.514 Message-ID: <1362234294.77730.YahooMailClassic@web121603.mail.ne1.yahoo.com> Date: Sat, 2 Mar 2013 06:24:54 -0800 (PST) From: Barney Cordoba Subject: Re: igb network lockups To: "Christopher D. Harrison" In-Reply-To: <512BAF8D.7080308@biostat.wisc.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Mar 2013 14:28:13 -0000 --- On Mon, 2/25/13, Christopher D. Harrison wr= ote: > From: Christopher D. Harrison > Subject: Re: igb network lockups > To: "Jack Vogel" > Cc: freebsd-net@freebsd.org > Date: Monday, February 25, 2013, 1:38 PM > Sure, > The problem appears on both systems running with ALTQ and > vanilla. > =A0 =A0=A0=A0-C > On 02/25/13 12:29, Jack Vogel wrote: > > I've not heard of this problem, but I think most users > do not use=20 > > ALTQ, and we (Intel) do not > > test using it. Can it be eliminated from the equation? > > > > Jack > > > > > > On Mon, Feb 25, 2013 at 10:16 AM, Christopher D. > Harrison=20 > > > > wrote: > > > >=A0 =A0=A0=A0I recently have been > experiencing network "freezes" and network > >=A0 =A0=A0=A0"lockups" on our Freebsd 9.1 > systems which are running zfs and nfs > >=A0 =A0=A0=A0file servers. > >=A0 =A0=A0=A0I upgraded from 9.0 to 9.1 > about 2 months ago and we have been > >=A0 =A0=A0=A0having issues with almost > bi-monthly.=A0=A0=A0The issue manifests in the > >=A0 =A0=A0=A0system becomes unresponsive to > any/all nfs clients.=A0=A0=A0The system > >=A0 =A0=A0=A0is not resource bound as our > I/O is low to disk and our network is > >=A0 =A0=A0=A0usually in the 20mbit/40mbit > range.=A0=A0=A0We do notice a correlation > >=A0 =A0=A0=A0between temporary i/o spikes > and network freezes but not enough to > >=A0 =A0=A0=A0send our system in to "lockup" > mode for the next 5min.=A0=A0=A0Currently > >=A0 =A0=A0=A0we have 4 igb nics in 2 aggr's > with 8 queue's per nic and our > >=A0 =A0=A0=A0dev.igb reports: > > > >=A0 =A0=A0=A0dev.igb.3.%desc: Intel(R) > PRO/1000 Network Connection version - 2.3.4 > > > >=A0 =A0=A0=A0I am almost certain the problem > is with the ibg driver as a friend > >=A0 =A0=A0=A0is also experiencing the same > problem with the same intel igb nic. > >=A0 =A0 =A0=A0=A0He has addressed the > issue by restarting the network using netif > >=A0 =A0=A0=A0on his > systems.=A0=A0=A0According to my friend, once the > network > >=A0 =A0=A0=A0interfaces get cleared, > everything comes back and starts working > >=A0 =A0=A0=A0as expected. > > > >=A0 =A0=A0=A0I have noticed an issue with > the igb driver and I was looking for > >=A0 =A0=A0=A0thoughts on how to help address > this problem. > >=A0 =A0=A0=A0http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drb= r-and-ALTQ-td5760338.html > > > >=A0 =A0=A0=A0Thoughts/Ideas are greatly > appreciated!!! > > > >=A0 =A0 =A0 =A0=A0=A0-C Do you have 32 cpus in the system? You've created a lock contention nightmare; frankly Im surprised that the system runs at all. Try running with 1 queue per nic. The point of using queues is to spread the load; the fact that you're even using queues with such a minuscule load is a commentary on the blind use of "features" without any explanation or understanding of what they do. Does igb still bind to CPUs without regard to whether its a real cpu or a hyper thread? This needs to be removed. I wish that someone who understood this stuff would have a beer with Jack and explain to him why this design is defective. The "default" for this driver is almost always the wrong configuration. You don't need to spread the load with 40Mb/s throughput, and using multiple queues will use a lot more CPU than using just 1. do you really want 4 cpus using 10% instead of 1 using 14%? You also should consider increasing your tx buffers; a property of=20 applications like ALTQ is that they tend to send out big bursts of=20 packets and they can overflow the rings. I'm not specifically familiar with ALTQ so Im not sure how it handles such things; nor am I sure of how it handles multiple tx queues, if at all. BC