Date: Fri, 4 Sep 2009 16:01:10 +0300 From: Artis Caune <artis.caune@gmail.com> To: alexpalias-bsdnet@yahoo.com Cc: freebsd-net@freebsd.org Subject: Re: em driver input errors Message-ID: <9e20d71e0909040601s100688c2m7d7f73eb187f4809@mail.gmail.com> In-Reply-To: <11420.28890.qm@web56404.mail.re3.yahoo.com> References: <11420.28890.qm@web56404.mail.re3.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
2009/8/1 <alexpalias-bsdnet@yahoo.com>: > Good day > > I'm running a FreeBSD 7.2 router and I am seeing a lot of input errors on= one of the em interfaces (em0), coupled with (at approximately the same ti= mes) much fewer errors on em1 and em2.=C2=A0 Monitoring is done with SNMP f= rom another machine, and the CPU load as reported via SNMP is mostly below = 30%, with a couple of spikes up to 35%. > > Software description: > > - FreeBSD 7.2-RELEASE-p2, amd64 > - bsnmpd with modules: hostres and (from ports) snmp_ucd > - quagga 0.99.12 (running only zebra and bgpd) > - netgraph (ng_ether and ng_netflow) > > Hardware description: > > - Dell machine, dual Xeon 3.20 GHz, 4 GB RAM > - 2 x built-in gigabit interfaces (em0, em1) > - 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf near the= end] > > > The machine receives the global routing table ("netstat -nr | wc -l" give= s 289115 currently). > > All of the em interfaces are just configured "up", with various vlan inte= rfaces on them.=C2=A0 Note that I use "kpps" to mean "thousands of packets = per second", sorry if that's the wrong shorthand. > > - em0 sees a traffic of 10...22 kpps in, and 15...35 kpps out.=C2=A0 In b= its, it's 30...120Mbits/s in, and 100...210Mbits/s out.=C2=A0 Vlans configu= red are vlan100 and vlan200, and most of the traffic is on vlan100 (vlan200= sees 4kpps in / 0.5kpps out maximum, with the average at about one third o= f this).=C2=A0 em0 is the external interface, and its traffic corresponds t= o the sum of traffic through em1 and em2 > > - em1 has 5 vlans, and sees about 22kpps in / 11kpps out (maximum) > > - em2 has a single VLAN, and sees about 4...13kpps both in and out (almos= t equal in/out during most of the day) > > - em3 is a backup interface, with 2 VLANS, and is the only one which has = seen no errors. > > Only the vlans on em0 are analyzed by ng_netflow, and the errors I'm seei= ng have started appearing days before netgraph was even loaded in the kerne= l. > > Tuning done: > > /boot/loader.conf: > hw.em.rxd=3D4096 > hw.em.txd=3D4096 > > Witout the above we were seeing way more errors, now they are reduced, bu= t still come in bursts of over 1000 errors on em0. > > /etc/sysctl.conf: > net.inet.ip.fastforwarding=3D1 > dev.em.0.rx_processing_limit=3D300 > dev.em.1.rx_processing_limit=3D300 > dev.em.2.rx_processing_limit=3D300 > dev.em.3.rx_processing_limit=3D300 > > Still seeing errros, after some searching the mailing lists we also added= : > > # the four lines below are repeated for em1, em2, em3 > dev.em.0.rx_int_delay=3D0 > dev.em.0.rx_abs_int_delay=3D0 > dev.em.0.tx_int_delay=3D0 > dev.em.0.tx_abs_int_delay=3D0 > > Still getting errors, so I also added: > > net.inet.ip.intr_queue_maxlen=3D4096 > net.route.netisr_maxqlen=3D1024 > > and > > kern.ipc.nmbclusters=3D655360 > > > Also tried with rx_processing_limit set to -1 on all em interfaces, still= getting errors. > > Looking at the shape of the error and packet graphs, there seems to be a = correlation between the number of packets per second on em0 and the height = of the error "spikes" on the error graph.=C2=A0 These spikes are spread thr= oughout the day, with spaces (zones with no errors) of various lengths (10 = minutes ... 2 hours spaces within the last 24 hours), but sometimes there a= re errors even in the lowest kpps times of the day. > > em0 and em1 error times are correlated, with all errors on the graph for = em0 having a smaller corresponding error spike on em1 at the same time, and= sometimes an error spike on em2. > > The old router was seeing about the same traffic, and had em0, em1, re0 a= nd re1 network cards, and was only seeing errors on the em cards.=C2=A0 It = was running 7.2-PRERELEASE/i386 > > > Any suggestions would be greatly appreciated.=C2=A0 Please note that this= is a live router, and I can't reboot it (unless absolutely necessary).=C2= =A0 Tuning that can be applied without rebooting will be tried first. Is it still actual? You didn't mention if you are using pf or other firewall. I have similar problem with two boxes replicating zfs pools, when I noticed input errors. After some investigation turns out it was pf overhead, even though I was skipping on interfaces where zfs sedn/recv. With pf enables (and skip) I can copy 50-80MB/s with 50-80Kpps and 0-100+ input drops per second. With pf disabled I can copy constantly with 102 or 93 MB/s and 110-131Kpps, few drops (because 1 CPU almost eaten). --=20 Artis Caune Everything should be made as simple as possible, but not simpler.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9e20d71e0909040601s100688c2m7d7f73eb187f4809>