Date: Sat, 1 Aug 2009 06:05:37 -0700 (PDT) From: alexpalias-bsdnet@yahoo.com To: freebsd-net@freebsd.org Subject: em driver input errors Message-ID: <11420.28890.qm@web56404.mail.re3.yahoo.com>
next in thread | raw e-mail | index | archive | help
Good day=0A=0AI'm running a FreeBSD 7.2 router and I am seeing a lot of inp= ut errors on one of the em interfaces (em0), coupled with (at approximately= the same times) much fewer errors on em1 and em2.=A0 Monitoring is done wi= th SNMP from another machine, and the CPU load as reported via SNMP is most= ly below 30%, with a couple of spikes up to 35%.=0A=0ASoftware description:= =0A=0A- FreeBSD 7.2-RELEASE-p2, amd64=0A- bsnmpd with modules: hostres and = (from ports) snmp_ucd=0A- quagga 0.99.12 (running only zebra and bgpd)=0A- = netgraph (ng_ether and ng_netflow)=0A=0AHardware description:=0A=0A- Dell m= achine, dual Xeon 3.20 GHz, 4 GB RAM=0A- 2 x built-in gigabit interfaces (e= m0, em1)=0A- 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf= near the end]=0A=0A=0AThe machine receives the global routing table ("nets= tat -nr | wc -l" gives 289115 currently).=0A=0AAll of the em interfaces are= just configured "up", with various vlan interfaces on them.=A0 Note that I= use "kpps" to mean "thousands of packets per second", sorry if that's the = wrong shorthand.=0A=0A- em0 sees a traffic of 10...22 kpps in, and 15...35 = kpps out.=A0 In bits, it's 30...120Mbits/s in, and 100...210Mbits/s out.=A0= Vlans configured are vlan100 and vlan200, and most of the traffic is on vl= an100 (vlan200 sees 4kpps in / 0.5kpps out maximum, with the average at abo= ut one third of this).=A0 em0 is the external interface, and its traffic co= rresponds to the sum of traffic through em1 and em2=0A=0A- em1 has 5 vlans,= and sees about 22kpps in / 11kpps out (maximum)=0A=0A- em2 has a single VL= AN, and sees about 4...13kpps both in and out (almost equal in/out during m= ost of the day)=0A=0A- em3 is a backup interface, with 2 VLANS, and is the = only one which has seen no errors.=0A=0AOnly the vlans on em0 are analyzed = by ng_netflow, and the errors I'm seeing have started appearing days before= netgraph was even loaded in the kernel.=0A=0ATuning done:=0A=0A/boot/loade= r.conf:=0Ahw.em.rxd=3D4096=0Ahw.em.txd=3D4096=0A=0AWitout the above we were= seeing way more errors, now they are reduced, but still come in bursts of = over 1000 errors on em0.=0A=0A/etc/sysctl.conf:=0Anet.inet.ip.fastforwardin= g=3D1=0Adev.em.0.rx_processing_limit=3D300=0Adev.em.1.rx_processing_limit= =3D300=0Adev.em.2.rx_processing_limit=3D300=0Adev.em.3.rx_processing_limit= =3D300=0A=0AStill seeing errros, after some searching the mailing lists we = also added:=0A=0A# the four lines below are repeated for em1, em2, em3=0Ade= v.em.0.rx_int_delay=3D0=0Adev.em.0.rx_abs_int_delay=3D0=0Adev.em.0.tx_int_d= elay=3D0=0Adev.em.0.tx_abs_int_delay=3D0=0A=0AStill getting errors, so I al= so added:=0A=0Anet.inet.ip.intr_queue_maxlen=3D4096=0Anet.route.netisr_maxq= len=3D1024=0A=0Aand=0A=0Akern.ipc.nmbclusters=3D655360=0A=0A=0AAlso tried w= ith rx_processing_limit set to -1 on all em interfaces, still getting error= s.=0A=0ALooking at the shape of the error and packet graphs, there seems to= be a correlation between the number of packets per second on em0 and the h= eight of the error "spikes" on the error graph.=A0 These spikes are spread = throughout the day, with spaces (zones with no errors) of various lengths (= 10 minutes ... 2 hours spaces within the last 24 hours), but sometimes ther= e are errors even in the lowest kpps times of the day.=0A=0Aem0 and em1 err= or times are correlated, with all errors on the graph for em0 having a smal= ler corresponding error spike on em1 at the same time, and sometimes an err= or spike on em2.=0A=0AThe old router was seeing about the same traffic, and= had em0, em1, re0 and re1 network cards, and was only seeing errors on the= em cards.=A0 It was running 7.2-PRERELEASE/i386=0A=0A=0AAny suggestions wo= uld be greatly appreciated.=A0 Please note that this is a live router, and = I can't reboot it (unless absolutely necessary).=A0 Tuning that can be appl= ied without rebooting will be tried first.=0A=0AHere are some more details:= =0A=0ATrimmed output of netstat -ni (sorry if there are line breaks):=0ANam= e=A0 =A0 Mtu Network=A0 =A0 =A0=A0=A0Address=A0 =A0 =A0 =A0 =A0 =A0 =A0 Ipk= ts Ierrs=A0 =A0 Opkts Oerrs=A0 Coll=0Aem0=A0 =A0 1500 <Link#1>=A0 =A0 =A0 0= 0:14:22:xx:xx:xx 19744458839 15494721 24284439443=A0 =A0=A0=A00=A0 =A0=A0= =A00=0Aem1=A0 =A0 1500 <Link#2>=A0 =A0 =A0 00:14:22:xx:xx:xx 12832245469 12= 3181 10105031790=A0 =A0=A0=A00=A0 =A0=A0=A00=0Aem2=A0 =A0 1500 <Link#3>=A0 = =A0 =A0 00:04:23:xx:xx:xx 12082552403 10964 10339416865=A0 =A0=A0=A00=A0 = =A0=A0=A00=0Aem3=A0 =A0 1500 <Link#4>=A0 =A0 =A0 00:04:23:xx:xx:xx 79912337= =A0 =A0=A0=A00 48178737=A0 =A0=A0=A00=A0 =A0=A0=A00=0A=0ARelevant part of p= ciconf -vl:=0A=0Aem0@pci0:6:7:0: class=3D0x020000 card=3D0x016d1028 chip=3D= 0x10768086 rev=3D0x05 hdr=3D0x00=0A=A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Co= rporation'=0A=A0 =A0 device=A0 =A0=A0=A0=3D '82541EI Gigabit Ethernet Contr= oller'=0A=A0 =A0 class=A0 =A0 =A0 =3D network=0A=A0 =A0 subclass=A0=A0=A0= =3D ethernet=0Aem1@pci0:7:8:0: class=3D0x020000 card=3D0x016d1028 chip=3D0x= 10768086 rev=3D0x05 hdr=3D0x00=0A=A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Corp= oration'=0A=A0 =A0 device=A0 =A0=A0=A0=3D '82541EI Gigabit Ethernet Control= ler'=0A=A0 =A0 class=A0 =A0 =A0 =3D network=0A=A0 =A0 subclass=A0=A0=A0=3D = ethernet=0Aem2@pci0:9:4:0: class=3D0x020000 card=3D0x10128086 chip=3D0x1010= 8086 rev=3D0x01 hdr=3D0x00=0A=A0 =A0 vendor=A0 =A0=A0=A0=3D 'Intel Corporat= ion'=0A=A0 =A0 device=A0 =A0=A0=A0=3D '82546EB Dual Port Gigabit Ethernet C= ontroller (Copper)'=0A=A0 =A0 class=A0 =A0 =A0 =3D network=0A=A0 =A0 subcla= ss=A0=A0=A0=3D ethernet=0Aem3@pci0:9:4:1: class=3D0x020000 card=3D0x1012808= 6 chip=3D0x10108086 rev=3D0x01 hdr=3D0x00=0A=A0 =A0 vendor=A0 =A0=A0=A0=3D = 'Intel Corporation'=0A=A0 =A0 device=A0 =A0=A0=A0=3D '82546EB Dual Port Gig= abit Ethernet Controller (Copper)'=0A=A0 =A0 class=A0 =A0 =A0 =3D network= =0A=A0 =A0 subclass=A0=A0=A0=3D ethernet=0A=0AKernel messages after sysctl = dev.em.0.stats=3D1:=0A(note that I've removed the lines which only showed z= eros in the second and third outputs)=0A=0Aem0: Excessive collisions =3D 0= =0Aem0: Sequence errors =3D 0=0Aem0: Defer count =3D 0=0Aem0: Missed Packet= s =3D 15435312=0Aem0: Receive No Buffers =3D 16446113=0Aem0: Receive Length= Errors =3D 0=0Aem0: Receive errors =3D 1=0Aem0: Crc errors =3D 2=0Aem0: Al= ignment errors =3D 0=0Aem0: Collision/Carrier extension errors =3D 0=0Aem0:= RX overruns =3D 96826=0Aem0: watchdog timeouts =3D 0=0Aem0: RX MSIX IRQ = =3D 0 TX MSIX IRQ =3D 0 LINK MSIX IRQ =3D 0=0Aem0: XON Rcvd =3D 0=0Aem0: XO= N Xmtd =3D 0=0Aem0: XOFF Rcvd =3D 0=0Aem0: XOFF Xmtd =3D 0=0Aem0: Good Pack= ets Rcvd =3D 19002068797=0Aem0: Good Packets Xmtd =3D 23168462599=0Aem0: TS= O Contexts Xmtd =3D 0=0Aem0: TSO Contexts Failed =3D 0=0A=0A[later]=0Aem0: = Excessive collisions =3D 0=0Aem0: Missed Packets =3D 15459111=0Aem0: Receiv= e No Buffers =3D 16447082=0Aem0: Receive errors =3D 1=0Aem0: Crc errors =3D= 2=0Aem0: RX overruns =3D 96835=0Aem0: Good Packets Rcvd =3D 19165047284=0A= em0: Good Packets Xmtd =3D 23386976960=0A=0A[later]=0Aem0: Excessive collis= ions =3D 0=0Aem0: Missed Packets =3D 15470583=0Aem0: Receive No Buffers =3D= 16447686=0Aem0: Receive errors =3D 1=0Aem0: Crc errors =3D 2=0Aem0: RX ove= rruns =3D 96840=0Aem0: Good Packets Rcvd =3D 19255466068=0Aem0: Good Packet= s Xmtd =3D 23519004546=0A=0A=0AThank you for your time.=0A
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?11420.28890.qm>