From owner-freebsd-stable@FreeBSD.ORG Sun Nov 21 09:21:28 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42552106566B for ; Sun, 21 Nov 2010 09:21:28 +0000 (UTC) (envelope-from Rolandas.Naujikas@mif.vu.lt) Received: from smtps.vu.lt (smtps.vu.lt [193.219.80.12]) by mx1.freebsd.org (Postfix) with ESMTP id D89ED8FC17 for ; Sun, 21 Nov 2010 09:21:27 +0000 (UTC) Received: from [192.168.1.144] ([95.173.38.63]) (authenticated bits=0) by smtps.vu.lt (VU/2010/05/24) with ESMTP id oAL9LQOJ008017 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 21 Nov 2010 11:21:26 +0200 (EET) Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: Naujikas Rolandas In-Reply-To: <65980530-3981-4C6B-B5CC-6309C678EDDF@mif.vu.lt> Date: Sun, 21 Nov 2010 11:21:25 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <6CD8D3F3-DB2B-49C2-BF24-5FB86B43685B@mif.vu.lt> References: <20101120155433.GA94454@icarus.home.lan> <1C336756-1447-4346-BFC6-0CE0856F5FA9@mif.vu.lt> <20101120170529.GA95574@icarus.home.lan> <7A80BA0C-596A-417C-B9E0-B2153276DA10@mif.vu.lt> <65980530-3981-4C6B-B5CC-6309C678EDDF@mif.vu.lt> To: Jack Vogel X-Mailer: Apple Mail (2.1082) Cc: freebsd-stable@freebsd.org, Jeremy Chadwick Subject: Re: problems with network on em X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Nov 2010 09:21:28 -0000 I just did testing with code from RELENG_8 and HEAD. http://install.mif.vu.lt/FreeBSD/net1/ Testing until 10:50 was with the version from RELENG_8. After it was testing with em driver from HEAD. At 10:50 testing stopped because machine was not responsible from = network, luckly I have serial console access and machine was rebooted (finally with "call cpu_reset()"). You can see smaller CPU load, but also 50% reduction of bandwith usage. Regards, Rolandas Naujikas On 2010.11.21, at 09:09, Naujikas Rolandas wrote: > When comparing there (in HEAD) I found many changes, most of them are = not related with my hardware. > Would it compile on FreeBSD 8.1-RELEASE-p1 ? > I could try on secondary router and test it again with 1Gbs traffic. >=20 > Regards, Rolandas Naujikas >=20 > On 2010.11.21, at 00:13, Jack Vogel wrote: >=20 >> I'd appreciate it if you could try and get the driver from HEAD, I = will be >> putting it into STABLE >> next week, and it would be nice to see if it fixed your problem. It = will >> build in your STABLE >> environment just fine, do you know how to do this, if not just say so = and I >> can give you >> further details. >>=20 >> Regards, >>=20 >> Jack >>=20 >>=20 >> On Sat, Nov 20, 2010 at 1:53 PM, Naujikas Rolandas < >> Rolandas.Naujikas@mif.vu.lt> wrote: >>=20 >>> I don't know about version, but I'm using RELENG_8 branch only. It = is >>> FreeBSD 8-STABLE also. >>>=20 >>> Regards, Rolandas Naujikas >>>=20 >>> P.S. I just got ~1Gbit/s (125MB/s,115Kpps) forwarding traffic in = testing >>> (24 nodes was downloading a file with wget from server from another = side of >>> router), but finally there was some deadlock. I'm recovering the = data on it. >>>=20 >>> On 2010.11.20, at 22:37, Jack Vogel wrote: >>>=20 >>>> Did you mean the 7.1.7 version from HEAD ? >>>>=20 >>>> Jack >>>>=20 >>>>=20 >>>> On Sat, Nov 20, 2010 at 11:18 AM, Naujikas Rolandas < >>>> Rolandas.Naujikas@mif.vu.lt> wrote: >>>>=20 >>>>> I'm trying to test with newest version of /sys/dev/e1000 from = FreeBSD >>>>> 8-STABLE. >>>>> For that I'm using loadable module option, because it is easier to = build >>>>> with minimal changes in kernel source. >>>>> Only /sys/dev/e1000 and /sys/modules/em need to be updated. >>>>> Without changes in /sys/modules/em/Makefile it compiles, but have >>> missing >>>>> symbol or if you compile static kernel - the same problem. >>>>> Now I'm testing and it looks promising (except I see a little = bigger >>> kernel >>>>> thread netisr cpu load, but it's acceptable). >>>>>=20 >>>>> Regards, Rolandas Naujikas >>>>>=20 >>>>> On 2010.11.20, at 19:05, Jeremy Chadwick wrote: >>>>>=20 >>>>>> On Sat, Nov 20, 2010 at 06:38:19PM +0200, Naujikas Rolandas = wrote: >>>>>>> I just got another lockup. >>>>>>> It looks like in the time of lockup the number of Ierrs is = increasing: >>>>>>> Name Mtu Network Address Ipkts Ierrs Idrop >>>>> Opkts Oerrs Coll >>>>>>> em2 1500 00:14:4f:XX:XX:XX 13060395 18438 0 >>>>> 6579984 1 0 >>>>>>>=20 >>>>>>> After "ifconfig em2 down;ifconfig em2 up" Ierrs stays at 0 rate = for >>> long >>>>> time. >>>>>>> Without DEVICE_POLLING it was similar situation. >>>>>>>=20 >>>>>>> Regards, Rolandas Naujikas >>>>>>>=20 >>>>>>> On 2010.11.20, at 18:24, rolnas@gmail.com wrote: >>>>>>>=20 >>>>>>>> On 2010.11.20, at 17:54, Jeremy Chadwick wrote: >>>>>>>>=20 >>>>>>>>> On Sat, Nov 20, 2010 at 05:09:28PM +0200, rolnas@gmail.com = wrote: >>>>>>>>>> I'm experiencing network interface stalls on em in FreeBSD >>>>> 8.1-RELEASE (-p1). >>>>>>>>>> It looks like the problem could be solved in 8-STABLE, but = should I >>>>> upgrade to it ? >>>>>>>>>> Is it OK to try to get only em driver code and recompile as = module >>>>> and try to run it ? >>>>>>>>>>=20 >>>>>>>>>> sysctl dev.em.2.stats=3D1: >>>>>>>>>> ... >>>>>>>>>> em2: Missed Packets =3D 101334 >>>>>>>>>> em2: Receive No Buffers =3D 488 >>>>>>>>>> ... >>>>>>>>>> em2: RX overruns =3D 1356 >>>>>>>>>> em2: watchdog timeouts =3D 1 >>>>>>>>>> ... >>>>>>>>>>=20 >>>>>>>>>> Only "ifconfig em2 down;ifconfig em2 up" helps for some time. >>>>>>>>>> The same happens on em0 interface only, but not in the same = time. >>>>>>>>>> It is production (NAT) router with pf+pfsync+carp and = failover over >>>>> another router. >>>>>>>>>> They are old "SunFire X4100" boxes (4GB RAM, 2*2 AMD Opteron >>> 2.2GHz). >>>>>>>>>=20 >>>>>>>>> You're going to need to provide output from the following, run = as >>>>> root. >>>>>>>>> For the pciconf command, please only include the entry that's >>> relevant >>>>>>>>> to the device in question (em2). You can also XXX-out the MAC >>> address >>>>>>>>> and/or IP addresses if you're worried about security. >>>>>>>>>=20 >>>>>>>>> $ pciconf -lvc >>>>>>>>=20 >>>>>>>> em2@pci0:1:2:0: class=3D0x020000 card=3D0x10118086 = chip=3D0x10108086 >>>>> rev=3D0x03 hdr=3D0x00 >>>>>>>> vendor =3D 'Intel Corporation' >>>>>>>> device =3D 'Dual Port Gigabit Ethernet Controller (Copper) >>>>> (82546EB)' >>>>>>>> class =3D network >>>>>>>> subclass =3D ethernet >>>>>>>> cap 01[dc] =3D powerspec 2 supports D0 D3 current D0 >>>>>>>> cap 07[e4] =3D PCI-X 64-bit supports 133MHz, 2048 burst read, 1 = split >>>>> transaction >>>>>>>> cap 05[f0] =3D MSI supports 1 message, 64 bit >>>>>>>>=20 >>>>>>>>> $ dmesg | grep em2 >>>>>>>>=20 >>>>>>>> em2: port >>>>> 0x9400-0x943f mem 0xfbfa0000-0xfbfbffff irq 24 at device 2.0 on = pci1 >>>>>>>> em2: [FILTER] >>>>>>>> em2: Ethernet address: 00:14:4f:XX:XX:XX >>>>>>>>=20 >>>>>>>>> $ sysctl dev.em.2 >>>>>>>>=20 >>>>>>>> dev.em.2.%desc: Intel(R) PRO/1000 Legacy Network Connection = 1.0.1 >>>>>>>> dev.em.2.%driver: em >>>>>>>> dev.em.2.%location: slot=3D2 function=3D0 >>>>>>>> dev.em.2.%pnpinfo: vendor=3D0x8086 device=3D0x1010 = subvendor=3D0x8086 >>>>> subdevice=3D0x1011 class=3D0x020000 >>>>>>>> dev.em.2.%parent: pci1 >>>>>>>> dev.em.2.debug: -1 >>>>>>>> dev.em.2.stats: -1 >>>>>>>> dev.em.2.rx_int_delay: 0 >>>>>>>> dev.em.2.tx_int_delay: 66 >>>>>>>> dev.em.2.rx_abs_int_delay: 66 >>>>>>>> dev.em.2.tx_abs_int_delay: 66 >>>>>>>> dev.em.2.rx_processing_limit: 100 >>>>>>>>=20 >>>>>>>>> $ uname -a >>>>>>>>=20 >>>>>>>> FreeBSD sunfire1.mif 8.1-RELEASE-p1 FreeBSD 8.1-RELEASE-p1 #2: = Thu >>> Nov >>>>> 18 10:39:07 EET 2010 root@sunfire1.mif >>> :/home/local/obj/usr/src/sys/SUNFIRE >>>>> amd64 >>>>>>>>=20 >>>>>>>> Recompiled with DEVICE_POLLING and HZ=3D2000, carp and many not = used >>>>> devices removed. >>>>>>>>=20 >>>>>>>>> $ netstat -ind -I em2 >>>>>>>>=20 >>>>>>>> Name Mtu Network Address Ipkts Ierrs = Idrop >>>>> Opkts Oerrs Coll Drop >>>>>>>> em2 1500 00:14:4f:XX:XX:XX 66430440 101334 = 0 >>>>> 59339619 1 0 0 >>>>>>>> em2 1500 192.168.0.0/1 192.168.XX.XXX 633845 - = - >>>>> 3815946 - - - >>>>>>>> ... >>>>>>>> em0 1500 00:14:4f:XX:XX:XX 167143400 152726 = 0 >>>>> 143900328 0 0 0 >>>>>>>>=20 >>>>>>>> Regards, Rolandas Naujikas >>>>>>>>=20 >>>>>>>>> Thanks. >>>>>>=20 >>>>>> Oops, I forgot requesting output from one other command: >>>>>>=20 >>>>>> $ vmstat -i >>>>>>=20 >>>>>> Adding Jack Vogel to the thread, who might have ideas/comments. = Jack, >>>>>> here's the thread: >>>>>>=20 >>>>>>=20 >>>>>=20 >>> = http://lists.freebsd.org/pipermail/freebsd-stable/2010-November/060183.htm= l >>>>>>=20 >>>>>> As for my comments: >>>>>>=20 >>>>>> Unidirectional errors (input or output) often indicates a duplex >>>>>> mismatch or some sort of weird "quirk" between one link partner = and the >>>>>> other. I *have* seen cases where both sides are auto-neg and one = side >>>>>> acts like it has the wrong duplex selection despite ifconfig = reporting >>>>>> full-duplex and the switch reporting full. Forcing speed and = duplex on >>>>>> both ends (requires a managed switch; please don't try this with = a >>>>>> generic consumer switch) resolved the problem. >>>>>>=20 >>>>>> It could be that there's a driver bug causing this to happen -- = down/up >>>>>> seems to indicate that could be the case -- but every situation = needs >>> to >>>>>> be addressed individually. >>>>>>=20 >>>>>> -- >>>>>> | Jeremy Chadwick = jdc@parodius.com | >>>>>> | Parodius Networking = http://www.parodius.com/ | >>>>>> | UNIX Systems Administrator Mountain View, CA, = USA | >>>>>> | Making life hard for others since 1977. PGP: = 4BD6C0CB | >>>>>>=20 >>>>>=20 >>>>>=20 >>>=20 >>>=20 >=20