Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 1 Aug 2009 06:05:37 -0700 (PDT)
From:      alexpalias-bsdnet@yahoo.com
To:        freebsd-net@freebsd.org
Subject:   em driver input errors
Message-ID:  <11420.28890.qm@web56404.mail.re3.yahoo.com>

next in thread | raw e-mail | index | archive | help
Good day=0A=0AI'm running a FreeBSD 7.2 router and I am seeing a lot of inp=
ut errors on one of the em interfaces (em0), coupled with (at approximately=
 the same times) much fewer errors on em1 and em2.=A0 Monitoring is done wi=
th SNMP from another machine, and the CPU load as reported via SNMP is most=
ly below 30%, with a couple of spikes up to 35%.=0A=0ASoftware description:=
=0A=0A- FreeBSD 7.2-RELEASE-p2, amd64=0A- bsnmpd with modules: hostres and =
(from ports) snmp_ucd=0A- quagga 0.99.12 (running only zebra and bgpd)=0A- =
netgraph (ng_ether and ng_netflow)=0A=0AHardware description:=0A=0A- Dell m=
achine, dual Xeon 3.20 GHz, 4 GB RAM=0A- 2 x built-in gigabit interfaces (e=
m0, em1)=0A- 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf=
 near the end]=0A=0A=0AThe machine receives the global routing table ("nets=
tat -nr | wc -l" gives 289115 currently).=0A=0AAll of the em interfaces are=
 just configured "up", with various vlan interfaces on them.=A0 Note that I=
 use "kpps" to mean "thousands of packets per second", sorry if that's the =
wrong shorthand.=0A=0A- em0 sees a traffic of 10...22 kpps in, and 15...35 =
kpps out.=A0 In bits, it's 30...120Mbits/s in, and 100...210Mbits/s out.=A0=
 Vlans configured are vlan100 and vlan200, and most of the traffic is on vl=
an100 (vlan200 sees 4kpps in / 0.5kpps out maximum, with the average at abo=
ut one third of this).=A0 em0 is the external interface, and its traffic co=
rresponds to the sum of traffic through em1 and em2=0A=0A- em1 has 5 vlans,=
 and sees about 22kpps in / 11kpps out (maximum)=0A=0A- em2 has a single VL=
AN, and sees about 4...13kpps both in and out (almost equal in/out during m=
ost of the day)=0A=0A- em3 is a backup interface, with 2 VLANS, and is the =
only one which has seen no errors.=0A=0AOnly the vlans on em0 are analyzed =
by ng_netflow, and the errors I'm seeing have started appearing days before=
 netgraph was even loaded in the kernel.=0A=0ATuning done:=0A=0A/boot/loade=
r.conf:=0Ahw.em.rxd=3D4096=0Ahw.em.txd=3D4096=0A=0AWitout the above we were=
 seeing way more errors, now they are reduced, but still come in bursts of =
over 1000 errors on em0.=0A=0A/etc/sysctl.conf:=0Anet.inet.ip.fastforwardin=
g=3D1=0Adev.em.0.rx_processing_limit=3D300=0Adev.em.1.rx_processing_limit=
=3D300=0Adev.em.2.rx_processing_limit=3D300=0Adev.em.3.rx_processing_limit=
=3D300=0A=0AStill seeing errros, after some searching the mailing lists we =
also added:=0A=0A# the four lines below are repeated for em1, em2, em3=0Ade=
v.em.0.rx_int_delay=3D0=0Adev.em.0.rx_abs_int_delay=3D0=0Adev.em.0.tx_int_d=
elay=3D0=0Adev.em.0.tx_abs_int_delay=3D0=0A=0AStill getting errors, so I al=
so added:=0A=0Anet.inet.ip.intr_queue_maxlen=3D4096=0Anet.route.netisr_maxq=
len=3D1024=0A=0Aand=0A=0Akern.ipc.nmbclusters=3D655360=0A=0A=0AAlso tried w=
ith rx_processing_limit set to -1 on all em interfaces, still getting error=
s.=0A=0ALooking at the shape of the error and packet graphs, there seems to=
 be a correlation between the number of packets per second on em0 and the h=
eight of the error "spikes" on the error graph.=A0 These spikes are spread =
throughout the day, with spaces (zones with no errors) of various lengths (=
10 minutes ... 2 hours spaces within the last 24 hours), but sometimes ther=
e are errors even in the lowest kpps times of the day.=0A=0Aem0 and em1 err=
or times are correlated, with all errors on the graph for em0 having a smal=
ler corresponding error spike on em1 at the same time, and sometimes an err=
or spike on em2.=0A=0AThe old router was seeing about the same traffic, and=
 had em0, em1, re0 and re1 network cards, and was only seeing errors on the=
 em cards.=A0 It was running 7.2-PRERELEASE/i386=0A=0A=0AAny suggestions wo=
uld be greatly appreciated.=A0 Please note that this is a live router, and =
I can't reboot it (unless absolutely necessary).=A0 Tuning that can be appl=
ied without rebooting will be tried first.=0A=0AHere are some more details:=
=0A=0ATrimmed output of netstat -ni (sorry if there are line breaks):=0ANam=
e=A0 =A0 Mtu Network=A0 =A0 =A0=A0=A0Address=A0 =A0 =A0 =A0 =A0 =A0 =A0 Ipk=
ts Ierrs=A0 =A0 Opkts Oerrs=A0 Coll=0Aem0=A0 =A0 1500 <Link#1>=A0 =A0 =A0 0=
0:14:22:xx:xx:xx 19744458839 15494721 24284439443=A0 =A0=A0=A00=A0 =A0=A0=
=A00=0Aem1=A0 =A0 1500 <Link#2>=A0 =A0 =A0 00:14:22:xx:xx:xx 12832245469 12=
3181 10105031790=A0 =A0=A0=A00=A0 =A0=A0=A00=0Aem2=A0 =A0 1500 <Link#3>=A0 =
=A0 =A0 00:04:23:xx:xx:xx 12082552403 10964 10339416865=A0 =A0=A0=A00=A0 =
=A0=A0=A00=0Aem3=A0 =A0 1500 <Link#4>=A0 =A0 =A0 00:04:23:xx:xx:xx 79912337=
=A0 =A0=A0=A00 48178737=A0 =A0=A0=A00=A0 =A0=A0=A00=0A=0ARelevant part of p=
ciconf -vl:=0A=0Aem0@pci0:6:7:0: class=3D0x020000 card=3D0x016d1028 chip=3D=
0x10768086 rev=3D0x05 hdr=3D0x00=0A=A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Co=
rporation'=0A=A0 =A0 device=A0 =A0=A0=A0=3D '82541EI Gigabit Ethernet Contr=
oller'=0A=A0 =A0 class=A0 =A0 =A0 =3D network=0A=A0 =A0 subclass=A0=A0=A0=
=3D ethernet=0Aem1@pci0:7:8:0: class=3D0x020000 card=3D0x016d1028 chip=3D0x=
10768086 rev=3D0x05 hdr=3D0x00=0A=A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Corp=
oration'=0A=A0 =A0 device=A0 =A0=A0=A0=3D '82541EI Gigabit Ethernet Control=
ler'=0A=A0 =A0 class=A0 =A0 =A0 =3D network=0A=A0 =A0 subclass=A0=A0=A0=3D =
ethernet=0Aem2@pci0:9:4:0: class=3D0x020000 card=3D0x10128086 chip=3D0x1010=
8086 rev=3D0x01 hdr=3D0x00=0A=A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Corporat=
ion'=0A=A0 =A0 device=A0 =A0=A0=A0=3D '82546EB Dual Port Gigabit Ethernet C=
ontroller (Copper)'=0A=A0 =A0 class=A0 =A0 =A0 =3D network=0A=A0 =A0 subcla=
ss=A0=A0=A0=3D ethernet=0Aem3@pci0:9:4:1: class=3D0x020000 card=3D0x1012808=
6 chip=3D0x10108086 rev=3D0x01 hdr=3D0x00=0A=A0 =A0 vendor=A0 =A0=A0=A0=3D =
'Intel Corporation'=0A=A0 =A0 device=A0 =A0=A0=A0=3D '82546EB Dual Port Gig=
abit Ethernet Controller (Copper)'=0A=A0 =A0 class=A0 =A0 =A0 =3D network=
=0A=A0 =A0 subclass=A0=A0=A0=3D ethernet=0A=0AKernel messages after sysctl =
dev.em.0.stats=3D1:=0A(note that I've removed the lines which only showed z=
eros in the second and third outputs)=0A=0Aem0: Excessive collisions =3D 0=
=0Aem0: Sequence errors =3D 0=0Aem0: Defer count =3D 0=0Aem0: Missed Packet=
s =3D 15435312=0Aem0: Receive No Buffers =3D 16446113=0Aem0: Receive Length=
 Errors =3D 0=0Aem0: Receive errors =3D 1=0Aem0: Crc errors =3D 2=0Aem0: Al=
ignment errors =3D 0=0Aem0: Collision/Carrier extension errors =3D 0=0Aem0:=
 RX overruns =3D 96826=0Aem0: watchdog timeouts =3D 0=0Aem0: RX MSIX IRQ =
=3D 0 TX MSIX IRQ =3D 0 LINK MSIX IRQ =3D 0=0Aem0: XON Rcvd =3D 0=0Aem0: XO=
N Xmtd =3D 0=0Aem0: XOFF Rcvd =3D 0=0Aem0: XOFF Xmtd =3D 0=0Aem0: Good Pack=
ets Rcvd =3D 19002068797=0Aem0: Good Packets Xmtd =3D 23168462599=0Aem0: TS=
O Contexts Xmtd =3D 0=0Aem0: TSO Contexts Failed =3D 0=0A=0A[later]=0Aem0: =
Excessive collisions =3D 0=0Aem0: Missed Packets =3D 15459111=0Aem0: Receiv=
e No Buffers =3D 16447082=0Aem0: Receive errors =3D 1=0Aem0: Crc errors =3D=
 2=0Aem0: RX overruns =3D 96835=0Aem0: Good Packets Rcvd =3D 19165047284=0A=
em0: Good Packets Xmtd =3D 23386976960=0A=0A[later]=0Aem0: Excessive collis=
ions =3D 0=0Aem0: Missed Packets =3D 15470583=0Aem0: Receive No Buffers =3D=
 16447686=0Aem0: Receive errors =3D 1=0Aem0: Crc errors =3D 2=0Aem0: RX ove=
rruns =3D 96840=0Aem0: Good Packets Rcvd =3D 19255466068=0Aem0: Good Packet=
s Xmtd =3D 23519004546=0A=0A=0AThank you for your time.=0A



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?11420.28890.qm>