From owner-freebsd-net@FreeBSD.ORG Fri Jan 14 21:47:34 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D77410656A4 for ; Fri, 14 Jan 2011 21:47:34 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from portcityhosting.com (edge.tidalhosting.net [64.140.243.92]) by mx1.freebsd.org (Postfix) with ESMTP id 0AFEA8FC0C for ; Fri, 14 Jan 2011 21:47:33 +0000 (UTC) Received: from jack.bspruce.com ([173.14.128.81]) by portcityhosting.com with MailEnable ESMTP; Fri, 14 Jan 2011 16:47:29 -0500 X-WatchGuard-Mail-Exception: Allow Message-ID: <4D30C473.7060900@greatbaysoftware.com> Date: Fri, 14 Jan 2011 16:47:31 -0500 From: Charles Owens MIME-Version: 1.0 To: Jack Vogel References: <20100729215649.GB2615@icir.org> <20110103210209.GA13091@icir.org> <4D2E66C4.5090607@greatbaysoftware.com> <4D2F20BB.5080204@greatbaysoftware.com> <4D2F71BE.2080801@greatbaysoftware.com> In-Reply-To: X-WatchGuard-AntiVirus: part scanned. clean action=allow X-ME-Bayesian: 0.000000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net Subject: Re: igb watchdog timeouts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jan 2011 21:47:34 -0000 Thanks for all the feedback on polling, Jack and others. Very helpful. We are working to merge the latest RELENG_8 em/igb driver into our custom build that's based on RELENG_8_1. I've been able to create a patch using the following command: cvs di -N -up -jRELENG_8_1 -jRELENG_8 sys/dev/e1000 sys/dev/ixgb sys/dev/ixgbe sys/conf/files > /tmp/e1000.diff ... by hand trimming sys/conf/files down to only the relevant bits. It compiled and seems to be functioning, but I wouldn't mind a sanity check on my methodology. In particular: * Some of the patches overlapped with sys/dev/ixgb, igbe... so I included them. Should I have? * Is there anything else I should have included? Thanks very much, Charles On 1/13/11 4:49 PM, Jack Vogel wrote: > Polling has seemed to me to be a way around other problems, problems > that these days > no longer exist. I remember back in the FreeBSD 6 days having > interrupt problems which > of course also led to watchdogs. Polling got rid of that. But now > there are dedicated > MULTIPLE interrupts by using MSIX, so that reason for polling is gone. > > Of course there can still be advantages, reducing interrupts and hence > context switches, > which is why the Linux approach does what it does. > > I have not spent time with that issue, its good to know that there > could be problems > lurking with it. But if you can simply go with MSIX I would do that > for now. > > Jack > > > On Thu, Jan 13, 2011 at 1:42 PM, Charles Owens > > wrote: > > So we went back to basics (stock 8.1-RELEASE) and found no > issue! We then added in our kernel mods one by one and > ultimately discovered that device-polling is the culprit (the > kernel config was simply GENERIC + PAE + polling). > > Immediately upon running "ifconfig igb0 polling" the symptoms appear. > > This is very good news overall, in that we can certainly disable > polling for igb. This begs the question, though, as to whether > polling is recommended these days at all for em/igb NICs... or > even in general. From other conversations we've seen there seems > to be some general debate about this. In testing we've done in > the past (circa 7.0) there certainly seemed to be benefit to using > this feature. What are your thoughts about this? > > For our product releases we'd like stay with RELENG_8_1. Would > you recommend the driver in 8.2 as being preferable? > > In case it's of interest: > > igb0@pci0:1:0:0: class=0x020000 card=0x34de8086 chip=0x10a78086 rev=0x02 > hdr=0x00 > vendor = 'Intel Corporation' device = '82575EB Gigabit Network Connection' > class = network > subclass = ethernet > > > > Thanks, > Charles > > > > On 1/13/11 1:27 PM, Jack Vogel wrote: >> The 8.2 latest does have the latest igb, so using that should be >> indicative... >> >> Jack >> >> >> On Thu, Jan 13, 2011 at 7:56 AM, Charles Owens >> > > wrote: >> >> Ok... I got my wires crossed: our first time testing 8.1 on >> this particular platform was with a kernel that had ichwd >> enabled (a new thing for us) and so when igb started >> complaining about "watchdog" we thought it was related. >> >> We've tested again and clearly the real story is that we're >> simply seeing igb issues, symptoms similar to those described. >> >> Does 8.2-RC1 have sufficiently "latest" code, or should I be >> looking to load up something else? (8-stable, maybe?) >> >> Thanks, >> Charles >> >> >> >> On 1/13/11 12:07 AM, Jack Vogel wrote: >>> The problem that Robin saw was due to having MSIX interrupts >>> disabled on the system, I doubt that >>> is going to be the "issue" for others. >>> >>> Get the latest version of the igb code and see if that helps >>> you as a first step. >>> >>> Jack >>> >>> >>> On Wed, Jan 12, 2011 at 6:43 PM, Charles Owens >>> >> > wrote: >>> >>> I'd like to report that we're running into this issue >>> also, in our case on systems that are based on the Intel >>> S5520UR Server Board, running 8.1-RELEASE. If the ichwd >>> driver is loaded we see the same messages, and network >>> communication via the igb nics is non-functional. >>> >>> Have you had any luck? >>> >>> Thanks, >>> Charles >>> >>> Charles Owens >>> Great Bay Software, Inc. >>> >>> >>> >>> >>> On 1/3/11 4:02 PM, Robin Sommer wrote: >>> >>> Hello all, >>> >>> quite a while ago I asked about the problem below. >>> Unfortunately, I >>> haven't found a solution yet and I'm actually still >>> seeing these >>> timeouts after just upgrading to 8.2-RC1. Any >>> further ideas on what >>> could be triggering them, or how I could track down >>> the cause? >>> >>> Thanks, >>> >>> Robin >>> >>> On Thu, Jul 29, 2010 at 14:56 -0700, I wrote: >>> >>> Since upgrading from 8.0 to 8.1-RELEASE, I'm >>> seeing lots of messages >>> like those below on all my SuperMicro >>> SBI-7425C-T3 blades. There's >>> almost no traffic on those interfaces. >>> >>> Any idea? >>> >>> Thanks, >>> >>> Robin >>> >>> Jul 29 13:01:18 blade0 kernel: igb1: Watchdog >>> timeout -- resetting >>> Jul 29 13:01:18 blade0 kernel: igb1: Queue(0) >>> tdh = 256, hw tdt = 266 >>> Jul 29 13:01:18 blade0 kernel: igb1: TX(0) desc >>> avail = 1013,Next TX to Clean = 255 >>> Jul 29 13:01:18 blade0 kernel: igb1: link state >>> changed to DOWN >>> Jul 29 13:01:18 blade0 kernel: igb1: link state >>> changed to UP >>> Jul 29 13:01:29 blade0 kernel: igb1: Watchdog >>> timeout -- resetting >>> Jul 29 13:01:29 blade0 kernel: igb1: Queue(0) >>> tdh = 0, hw tdt = 10 >>> Jul 29 13:01:29 blade0 kernel: igb1: TX(0) desc >>> avail = 1014,Next TX to Clean = 0 >>> Jul 29 13:01:29 blade0 kernel: igb1: link state >>> changed to DOWN >>> Jul 29 13:01:29 blade0 kernel: igb1: link state >>> changed to UP >>> Jul 29 13:01:46 blade0 kernel: igb1: Watchdog >>> timeout -- resetting >>> Jul 29 13:01:46 blade0 kernel: igb1: Queue(0) >>> tdh = 32, hw tdt = 33 >>> Jul 29 13:01:46 blade0 kernel: igb1: TX(0) desc >>> avail = 1022,Next TX to Clean = 31 >>> Jul 29 13:01:46 blade0 kernel: igb1: link state >>> changed to DOWN >>> Jul 29 13:01:46 blade0 kernel: igb1: link state >>> changed to UP >>> Jul 29 13:01:57 blade0 kernel: igb1: Watchdog >>> timeout -- resetting >>> Jul 29 13:01:57 blade0 kernel: igb1: Queue(0) >>> tdh = 0, hw tdt = 10 >>> Jul 29 13:01:57 blade0 kernel: igb1: TX(0) desc >>> avail = 1014,Next TX to Clean = 0 >>> Jul 29 13:01:57 blade0 kernel: igb1: link state >>> changed to DOWN >>> Jul 29 13:01:58 blade0 kernel: igb1: link state >>> changed to UP >>> Jul 29 13:02:13 blade0 kernel: igb1: Watchdog >>> timeout -- resetting >>> >>> grep igb /var/run/dmesg.boot >>> >>> igb0:>> version - 1.9.5> port 0x2000-0x201f mem >>> 0xfc940000-0xfc95ffff,0xfc920000-0xfc93ffff,0xfc900000-0xfc903fff >>> irq 16 at device 0.0 on pci4 >>> igb0: [FILTER] >>> igb0: Ethernet address: 00:30:48:9e:22:00 >>> igb1:>> version - 1.9.5> port 0x2020-0x203f mem >>> 0xfc980000-0xfc99ffff,0xfc960000-0xfc97ffff,0xfc904000-0xfc907fff >>> irq 17 at device 0.1 on pci4 >>> igb1: [FILTER] >>> igb1: Ethernet address: 00:30:48:9e:22:01 >>> >>> pciconf -lv >>> >>> [...] >>> igb0@pci0:4:0:0: class=0x020000 card=0x10a915d9 >>> chip=0x10a98086 rev=0x02 hdr=0x00 >>> vendor = 'Intel Corporation' >>> device = '82575EB Gigabit Backplane >>> Connection' >>> class = network >>> subclass = ethernet >>> igb1@pci0:4:0:1: class=0x020000 >>> card=0x10a915d9 >>> chip=0x10a98086 rev=0x02 hdr=0x00 >>> vendor = 'Intel Corporation' >>> device = '82575EB Gigabit Backplane >>> Connection' >>> class = network >>> subclass = ethernet >>> [...] >>> >>> >>> _______________________________________________ >>> freebsd-net@freebsd.org >>> mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to >>> "freebsd-net-unsubscribe@freebsd.org >>> " >>> >>> >> >