Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Sep 2009 16:01:10 +0300
From:      Artis Caune <artis.caune@gmail.com>
To:        alexpalias-bsdnet@yahoo.com
Cc:        freebsd-net@freebsd.org
Subject:   Re: em driver input errors
Message-ID:  <9e20d71e0909040601s100688c2m7d7f73eb187f4809@mail.gmail.com>
In-Reply-To: <11420.28890.qm@web56404.mail.re3.yahoo.com>
References:  <11420.28890.qm@web56404.mail.re3.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2009/8/1  <alexpalias-bsdnet@yahoo.com>:
> Good day
>
> I'm running a FreeBSD 7.2 router and I am seeing a lot of input errors on=
 one of the em interfaces (em0), coupled with (at approximately the same ti=
mes) much fewer errors on em1 and em2.=C2=A0 Monitoring is done with SNMP f=
rom another machine, and the CPU load as reported via SNMP is mostly below =
30%, with a couple of spikes up to 35%.
>
> Software description:
>
> - FreeBSD 7.2-RELEASE-p2, amd64
> - bsnmpd with modules: hostres and (from ports) snmp_ucd
> - quagga 0.99.12 (running only zebra and bgpd)
> - netgraph (ng_ether and ng_netflow)
>
> Hardware description:
>
> - Dell machine, dual Xeon 3.20 GHz, 4 GB RAM
> - 2 x built-in gigabit interfaces (em0, em1)
> - 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf near the=
 end]
>
>
> The machine receives the global routing table ("netstat -nr | wc -l" give=
s 289115 currently).
>
> All of the em interfaces are just configured "up", with various vlan inte=
rfaces on them.=C2=A0 Note that I use "kpps" to mean "thousands of packets =
per second", sorry if that's the wrong shorthand.
>
> - em0 sees a traffic of 10...22 kpps in, and 15...35 kpps out.=C2=A0 In b=
its, it's 30...120Mbits/s in, and 100...210Mbits/s out.=C2=A0 Vlans configu=
red are vlan100 and vlan200, and most of the traffic is on vlan100 (vlan200=
 sees 4kpps in / 0.5kpps out maximum, with the average at about one third o=
f this).=C2=A0 em0 is the external interface, and its traffic corresponds t=
o the sum of traffic through em1 and em2
>
> - em1 has 5 vlans, and sees about 22kpps in / 11kpps out (maximum)
>
> - em2 has a single VLAN, and sees about 4...13kpps both in and out (almos=
t equal in/out during most of the day)
>
> - em3 is a backup interface, with 2 VLANS, and is the only one which has =
seen no errors.
>
> Only the vlans on em0 are analyzed by ng_netflow, and the errors I'm seei=
ng have started appearing days before netgraph was even loaded in the kerne=
l.
>
> Tuning done:
>
> /boot/loader.conf:
> hw.em.rxd=3D4096
> hw.em.txd=3D4096
>
> Witout the above we were seeing way more errors, now they are reduced, bu=
t still come in bursts of over 1000 errors on em0.
>
> /etc/sysctl.conf:
> net.inet.ip.fastforwarding=3D1
> dev.em.0.rx_processing_limit=3D300
> dev.em.1.rx_processing_limit=3D300
> dev.em.2.rx_processing_limit=3D300
> dev.em.3.rx_processing_limit=3D300
>
> Still seeing errros, after some searching the mailing lists we also added=
:
>
> # the four lines below are repeated for em1, em2, em3
> dev.em.0.rx_int_delay=3D0
> dev.em.0.rx_abs_int_delay=3D0
> dev.em.0.tx_int_delay=3D0
> dev.em.0.tx_abs_int_delay=3D0
>
> Still getting errors, so I also added:
>
> net.inet.ip.intr_queue_maxlen=3D4096
> net.route.netisr_maxqlen=3D1024
>
> and
>
> kern.ipc.nmbclusters=3D655360
>
>
> Also tried with rx_processing_limit set to -1 on all em interfaces, still=
 getting errors.
>
> Looking at the shape of the error and packet graphs, there seems to be a =
correlation between the number of packets per second on em0 and the height =
of the error "spikes" on the error graph.=C2=A0 These spikes are spread thr=
oughout the day, with spaces (zones with no errors) of various lengths (10 =
minutes ... 2 hours spaces within the last 24 hours), but sometimes there a=
re errors even in the lowest kpps times of the day.
>
> em0 and em1 error times are correlated, with all errors on the graph for =
em0 having a smaller corresponding error spike on em1 at the same time, and=
 sometimes an error spike on em2.
>
> The old router was seeing about the same traffic, and had em0, em1, re0 a=
nd re1 network cards, and was only seeing errors on the em cards.=C2=A0 It =
was running 7.2-PRERELEASE/i386
>
>
> Any suggestions would be greatly appreciated.=C2=A0 Please note that this=
 is a live router, and I can't reboot it (unless absolutely necessary).=C2=
=A0 Tuning that can be applied without rebooting will be tried first.


Is it still actual?
You didn't mention if you are using pf or other firewall.
I have similar problem with two boxes replicating zfs pools, when I
noticed input errors.
After some investigation turns out it was pf overhead, even though I
was skipping on interfaces where zfs sedn/recv.

With pf enables (and skip) I can copy 50-80MB/s with 50-80Kpps and
0-100+ input drops per second.
With pf disabled I can copy constantly with 102 or 93 MB/s and
110-131Kpps, few drops (because 1 CPU almost eaten).





--=20
Artis Caune

    Everything should be made as simple as possible, but not simpler.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9e20d71e0909040601s100688c2m7d7f73eb187f4809>