From owner-freebsd-stable@FreeBSD.ORG Tue Sep 14 17:59:46 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69D4D1065674 for ; Tue, 14 Sep 2010 17:59:46 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost2.sentex.ca (smarthost2-6.sentex.ca [IPv6:2607:f3e0:80:80::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0B29F8FC15 for ; Tue, 14 Sep 2010 17:59:45 +0000 (UTC) Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18]) by smarthost2.sentex.ca (8.14.4/8.14.4) with ESMTP id o8EHxdA4094083 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 14 Sep 2010 13:59:39 -0400 (EDT) (envelope-from mike@sentex.net) Received: from mdt-xp.sentex.net (simeon.sentex.ca [192.168.43.27]) by lava.sentex.ca (8.14.4/8.14.3) with ESMTP id o8EHxcZ0013539; Tue, 14 Sep 2010 13:59:38 -0400 (EDT) (envelope-from mike@sentex.net) Message-Id: <201009141759.o8EHxcZ0013539@lava.sentex.ca> X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9 Date: Tue, 14 Sep 2010 13:59:36 -0400 To: pyunyh@gmail.com From: Mike Tancsa In-Reply-To: <20100817200020.GE6482@michelle.cdnetworks.com> References: <201006102031.o5AKVCH2016467@lava.sentex.ca> <201007021739.o62HdMOU092319@lava.sentex.ca> <20100702193654.GD10862@michelle.cdnetworks.com> <201008162107.o7GL76pA080191@lava.sentex.ca> <20100817185208.GA6482@michelle.cdnetworks.com> <201008171955.o7HJt67T087902@lava.sentex.ca> <20100817200020.GE6482@michelle.cdnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Scanned-By: MIMEDefang 2.67 on 205.211.164.50 Cc: freebsd-stable@freebsd.org, jfvogel@gmail.com Subject: Re: RELENG_7 em problems (and RELENG_8) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2010 17:59:46 -0000 Hi Jack, Any plans to commit the patch below ? I have been running it on a number of boxes and it works as expected with no side effects. ---Mike At 04:00 PM 8/17/2010, Pyun YongHyeon wrote: >On Tue, Aug 17, 2010 at 03:55:12PM -0400, Mike Tancsa wrote: > > At 02:52 PM 8/17/2010, Pyun YongHyeon wrote: > > > > >Here is updated patch for HEAD and stable/8. > > >http://people.freebsd.org/~yongari/em.csum_tso.20100817.patch > > > > > >It seems to work as expected under my limited environments. If > > > > Thanks! The patch applies cleanly and all works as expected now! I am > > no longer able to trigger the bug. I just use the stock unmodified > > driver normally, so no multi queues > > > >Glad to hear that. Thanks for testing! > > > # vmstat -i > > interrupt total rate > > irq256: em0 149 0 > > irq257: em1 3 0 > > irq259: em3 971 2 > > irq260: ahci0 1520 3 > > > > > > > > em3: flags=8843 metric 0 mtu 1500 > > > options=219b > > ether 00:15:17:xx:xx:xx > > inet6 fe80::215:17ff:fexx:xxxx%em3 prefixlen 64 scopeid 0x4 > > inet 192.168.xx.xx netmask 0xffffff00 broadcast 192.168.xx.xx > > nd6 options=3 > > media: Ethernet autoselect (100baseTX ) > > status: active > > > > > > em3@pci0:3:0:0: class=0x020000 card=0x34ec8086 chip=0x10d38086 > > rev=0x00 hdr=0x00 > > vendor = 'Intel Corporation' > > device = 'Intel 82574L Gigabit Ethernet Controller (82574L)' > > class = network > > subclass = ethernet > > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > > cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) > > cap 11[a0] = MSI-X supports 5 messages in map 0x1c > > > > > > > > patch < em.csum_tso.20100817.patch > > Hmm... Looks like a unified diff to me... > > The text leading up to this was: > > -------------------------- > > |Index: sys/dev/e1000/if_em.c > > |=================================================================== > > |--- sys/dev/e1000/if_em.c (revision 211398) > > |+++ sys/dev/e1000/if_em.c (working copy) > > -------------------------- > > Patching file sys/dev/e1000/if_em.c using Plan A... > > Hunk #1 succeeded at 237. > > Hunk #2 succeeded at 1730. > > Hunk #3 succeeded at 1759. > > Hunk #4 succeeded at 1930. > > Hunk #5 succeeded at 3148. > > Hunk #6 succeeded at 3351. > > Hunk #7 succeeded at 3533. > > Hunk #8 succeeded at 3590. > > Hunk #9 succeeded at 3603. > > Hmm... The next patch looks like a unified diff to me... > > The text leading up to this was: > > -------------------------- > > |Index: sys/dev/e1000/if_em.h > > |=================================================================== > > |--- sys/dev/e1000/if_em.h (revision 211398) > > |+++ sys/dev/e1000/if_em.h (working copy) > > -------------------------- > > Patching file sys/dev/e1000/if_em.h using Plan A... > > Hunk #1 succeeded at 284. > > done > > > > ---Mike > > > > > > >you're using multiple Tx queues with em(4) it would be better to > > >disable Tx checksum offloading as driver always have to create a > > >new checksum context for each frame. This will effectively disable > > >pipelined Tx data DMA which in turn greatly slows down Tx > > >performance for small sized frames. The reason driver have to > > >create a new checksum context when it uses multiple Tx queues comes > > >from hardware limitation. The controller tracks only for the last > > >context descriptor that was written such that driver does not know > > >the state of checksum context configured in other Tx queue. > > >Hope this helps. > > > > > >> > > >> > > >> ---Mike > > >> > > >> > > >> At 03:36 PM 7/2/2010, Pyun YongHyeon wrote: > > >> >On Fri, Jul 02, 2010 at 01:39:22PM -0400, Mike Tancsa wrote: > > >> >> Hi Jack, > > >> >> Just a followup to the email below. I now saw what appears > > >> >> to be the same problem on RELENG_8, but on a different nic and with > > >> >> VLANs. So not sure if this is a general em problem, a problem > > >> >> specific to some em NICs, or a TSO problem in general. The issue > > >> >> seemed to be triggered when I added a new vlan based on > > >> >> > > >> >> em3@pci0:14:0:0: class=0x020000 card=0x109a15d9 > > >> >> chip=0x109a8086 rev=0x00 hdr=0x00 > > >> >> vendor = 'Intel Corporation' > > >> >> device = 'Intel PRO/1000 PL Network Adaptor (82573L)' > > >> >> class = network > > >> >> subclass = ethernet > > >> >> cap 01[c8] = powerspec 2 supports D0 D3 current D0 > > >> >> cap 05[d0] = MSI supports 1 message, 64 bit enabled > with 1 message > > >> >> cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) > > >> >> > > >> >> pci14: on pcib5 > > >> >> em3: port 0x6000-0x601f > > >> >> mem 0xe8300000-0xe831ffff irq 17 at device 0.0 on pci14 > > >> >> em3: Using MSI interrupt > > >> >> em3: [FILTER] > > >> >> em3: Ethernet address: 00:30:48:9f:eb:81 > > >> >> > > >> >> em3: flags=8943 > > >> >> metric 0 mtu 1500 > > >> >> options=2098 > > >> >> ether 00:30:48:9f:eb:81 > > >> >> inet 10.255.255.254 netmask 0xfffffffc broadcast > > >10.255.255.255 > > >> >> media: Ethernet autoselect (1000baseT ) > > >> >> status: active > > >> >> > > >> >> I had to disable tso, rxcsum and txsum in order to see the devices on > > >> >> the other side of the two vlans trunked off em3. Unfortunately, the > > >> >> other sides were switches 100km and 500km away so I didnt have any > > >> >> tcpdump capabilities to diagnose the issue. I had already created > > >> >> one vlan off this NIC and all was fine. A few weeks later, I added a > > >> >> new one and I could no longer telnet into the remote switches from > > >> >> the local machine.... But, I could telnet into the switches from > > >> >> machines not on the problem box. Hence, it would appear to be a > > >> >> general TSO issue no ? I disabled tso on the nic (I didnt disable > > >> >> net.inet.tcp.tso as I forgot about that).. Still nothing. I could > > >> >> always ping the remote devices, but no tcp services. I then > > >> >> remembered this issue from before, so I tried disabling tso on the > > >> >> NIC. Still nothing. Then I disabled rxcsum and txcsum and I could > > >> >> then telnet into the remote devices. > > >> >> > > >> >> This newly observed issue was from a buildworld on Mon Jun 14 > > >> >> 11:29:12 EDT 2010. > > >> >> > > >> >> I will try and recreate the issue locally again to see if I can > > >> >> trigger the problem on demand. Any thoughts on what it might be ? > > >> >> Perhaps an issue specific to certain em nics ? > > >> >> > > >> > > > >> >http://www.freebsd.org/cgi/query-pr.cgi?pr=141843 > > >> >I'm not sure whether you're seeing the same issue though. > > >> >I didn't have chance to try latest em(4) on stable/7. > > >> > > >> -------------------------------------------------------------------- > > >> Mike Tancsa, tel +1 519 651 3400 > > >> Sentex Communications, mike@sentex.net > > >> Providing Internet since 1994 www.sentex.net > > >> Cambridge, Ontario Canada www.sentex.net/mike > > >> > > > > -------------------------------------------------------------------- > > Mike Tancsa, tel +1 519 651 3400 > > Sentex Communications, mike@sentex.net > > Providing Internet since 1994 www.sentex.net > > Cambridge, Ontario Canada www.sentex.net/mike > > -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike