From owner-freebsd-stable@FreeBSD.ORG Sat Nov 20 19:18:05 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6470B106566B for ; Sat, 20 Nov 2010 19:18:03 +0000 (UTC) (envelope-from Rolandas.Naujikas@mif.vu.lt) Received: from smtps.vu.lt (smtps.vu.lt [193.219.80.12]) by mx1.freebsd.org (Postfix) with ESMTP id F29868FC08 for ; Sat, 20 Nov 2010 19:18:02 +0000 (UTC) Received: from [192.168.1.144] ([95.173.38.63]) (authenticated bits=0) by smtps.vu.lt (VU/2010/05/24) with ESMTP id oAKJI16U000545 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 20 Nov 2010 21:18:01 +0200 (EET) Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: Naujikas Rolandas In-Reply-To: <20101120170529.GA95574@icarus.home.lan> Date: Sat, 20 Nov 2010 21:18:00 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20101120155433.GA94454@icarus.home.lan> <1C336756-1447-4346-BFC6-0CE0856F5FA9@mif.vu.lt> <20101120170529.GA95574@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1082) Cc: freebsd-stable@freebsd.org, jfvogel@gmail.com Subject: Re: problems with network on em X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Nov 2010 19:18:05 -0000 I'm trying to test with newest version of /sys/dev/e1000 from FreeBSD = 8-STABLE. For that I'm using loadable module option, because it is easier to build = with minimal changes in kernel source. Only /sys/dev/e1000 and /sys/modules/em need to be updated. Without changes in /sys/modules/em/Makefile it compiles, but have = missing symbol or if you compile static kernel - the same problem. Now I'm testing and it looks promising (except I see a little bigger = kernel thread netisr cpu load, but it's acceptable). Regards, Rolandas Naujikas On 2010.11.20, at 19:05, Jeremy Chadwick wrote: > On Sat, Nov 20, 2010 at 06:38:19PM +0200, Naujikas Rolandas wrote: >> I just got another lockup. >> It looks like in the time of lockup the number of Ierrs is = increasing: >> Name Mtu Network Address Ipkts Ierrs Idrop = Opkts Oerrs Coll >> em2 1500 00:14:4f:XX:XX:XX 13060395 18438 0 = 6579984 1 0 >>=20 >> After "ifconfig em2 down;ifconfig em2 up" Ierrs stays at 0 rate for = long time. >> Without DEVICE_POLLING it was similar situation. >>=20 >> Regards, Rolandas Naujikas >>=20 >> On 2010.11.20, at 18:24, rolnas@gmail.com wrote: >>=20 >>> On 2010.11.20, at 17:54, Jeremy Chadwick wrote: >>>=20 >>>> On Sat, Nov 20, 2010 at 05:09:28PM +0200, rolnas@gmail.com wrote: >>>>> I'm experiencing network interface stalls on em in FreeBSD = 8.1-RELEASE (-p1). >>>>> It looks like the problem could be solved in 8-STABLE, but should = I upgrade to it ? >>>>> Is it OK to try to get only em driver code and recompile as module = and try to run it ? >>>>>=20 >>>>> sysctl dev.em.2.stats=3D1: >>>>> ... >>>>> em2: Missed Packets =3D 101334 >>>>> em2: Receive No Buffers =3D 488 >>>>> ... >>>>> em2: RX overruns =3D 1356 >>>>> em2: watchdog timeouts =3D 1 >>>>> ... >>>>>=20 >>>>> Only "ifconfig em2 down;ifconfig em2 up" helps for some time. >>>>> The same happens on em0 interface only, but not in the same time. >>>>> It is production (NAT) router with pf+pfsync+carp and failover = over another router. >>>>> They are old "SunFire X4100" boxes (4GB RAM, 2*2 AMD Opteron = 2.2GHz). >>>>=20 >>>> You're going to need to provide output from the following, run as = root. >>>> For the pciconf command, please only include the entry that's = relevant >>>> to the device in question (em2). You can also XXX-out the MAC = address >>>> and/or IP addresses if you're worried about security. >>>>=20 >>>> $ pciconf -lvc >>>=20 >>> em2@pci0:1:2:0: class=3D0x020000 card=3D0x10118086 chip=3D0x10108086 = rev=3D0x03 hdr=3D0x00 >>> vendor =3D 'Intel Corporation' >>> device =3D 'Dual Port Gigabit Ethernet Controller (Copper) = (82546EB)' >>> class =3D network >>> subclass =3D ethernet >>> cap 01[dc] =3D powerspec 2 supports D0 D3 current D0 >>> cap 07[e4] =3D PCI-X 64-bit supports 133MHz, 2048 burst read, 1 = split transaction >>> cap 05[f0] =3D MSI supports 1 message, 64 bit=20 >>>=20 >>>> $ dmesg | grep em2 >>>=20 >>> em2: port = 0x9400-0x943f mem 0xfbfa0000-0xfbfbffff irq 24 at device 2.0 on pci1 >>> em2: [FILTER] >>> em2: Ethernet address: 00:14:4f:XX:XX:XX >>>=20 >>>> $ sysctl dev.em.2 >>>=20 >>> dev.em.2.%desc: Intel(R) PRO/1000 Legacy Network Connection 1.0.1 >>> dev.em.2.%driver: em >>> dev.em.2.%location: slot=3D2 function=3D0 >>> dev.em.2.%pnpinfo: vendor=3D0x8086 device=3D0x1010 subvendor=3D0x8086 = subdevice=3D0x1011 class=3D0x020000 >>> dev.em.2.%parent: pci1 >>> dev.em.2.debug: -1 >>> dev.em.2.stats: -1 >>> dev.em.2.rx_int_delay: 0 >>> dev.em.2.tx_int_delay: 66 >>> dev.em.2.rx_abs_int_delay: 66 >>> dev.em.2.tx_abs_int_delay: 66 >>> dev.em.2.rx_processing_limit: 100 >>>=20 >>>> $ uname -a >>>=20 >>> FreeBSD sunfire1.mif 8.1-RELEASE-p1 FreeBSD 8.1-RELEASE-p1 #2: Thu = Nov 18 10:39:07 EET 2010 = root@sunfire1.mif:/home/local/obj/usr/src/sys/SUNFIRE amd64 >>>=20 >>> Recompiled with DEVICE_POLLING and HZ=3D2000, carp and many not used = devices removed. >>>=20 >>>> $ netstat -ind -I em2 >>>=20 >>> Name Mtu Network Address Ipkts Ierrs Idrop = Opkts Oerrs Coll Drop >>> em2 1500 00:14:4f:XX:XX:XX 66430440 101334 0 = 59339619 1 0 0=20 >>> em2 1500 192.168.0.0/1 192.168.XX.XXX 633845 - - = 3815946 - - -=20 >>> ... >>> em0 1500 00:14:4f:XX:XX:XX 167143400 152726 0 = 143900328 0 0 0=20 >>>=20 >>> Regards, Rolandas Naujikas >>>=20 >>>> Thanks. >=20 > Oops, I forgot requesting output from one other command: >=20 > $ vmstat -i >=20 > Adding Jack Vogel to the thread, who might have ideas/comments. Jack, > here's the thread: >=20 > = http://lists.freebsd.org/pipermail/freebsd-stable/2010-November/060183.htm= l >=20 > As for my comments: >=20 > Unidirectional errors (input or output) often indicates a duplex > mismatch or some sort of weird "quirk" between one link partner and = the > other. I *have* seen cases where both sides are auto-neg and one side > acts like it has the wrong duplex selection despite ifconfig reporting > full-duplex and the switch reporting full. Forcing speed and duplex = on > both ends (requires a managed switch; please don't try this with a > generic consumer switch) resolved the problem. >=20 > It could be that there's a driver bug causing this to happen -- = down/up > seems to indicate that could be the case -- but every situation needs = to > be addressed individually. >=20 > --=20 > | Jeremy Chadwick jdc@parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | >=20