Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Dec 2021 22:15:55 +0500
From:      "Eugene M. Zheganin" <eugene@zhegan.in>
To:        stable@freebsd.org
Subject:   FreeBSD and interface errors/drops
Message-ID:  <480b9b6c-33d2-4eb3-c48c-aeee8ded7751@zhegan.in>

next in thread | raw e-mail | index | archive | help
Hello,

I have a FreeBSD used as an office router for an organisation. It was 
installed years ago and its configuration is:

- IBM system x3250m2

- 4 gigs RAM

- Intel Xeon E3110

- two WDC WD5000AADS-0 in a zfs mirror.

- bge(4) NetXtreme BCM5722 Gigabit Ethernet PCI Express, initially present
- em(4) 82572EI Gigabit Ethernet Controller (Copper) adapter, added later

This server is connected with both WAN and LAN using one bge(4) link to 
a Cisco catalyst 2960, comprising several vlans.

After several years of running, starting from 10.x, when 12.2 was 
already installed for quite some time, I started having a huge number of 
input errors on an interface, that were increasing the dev.bge.0 
counters like

dev.bge.0.stats.InputDiscards

Error input rate was changing from 0 (most of the time) to 6K-80K per 
second. The observed interface input rate was floating around 300 Mbps 
during it's peak.

Sample of netstat -I bge0 1 showing the moment when there's bunch of 
errors and the amount of traffic:

           input        bge0       output
    packets errs idrops bytes packets errs bytes colls
      20695 701 0 19244062 18182 0 17059035 0
        929 61003 0 938482 438 0 118494 0
       1383 44667 0 1094321 537 0 196633 0
      11726 1 0 8904828 11560 0 9012153 0
       6116 0 0 3991680 6051 0 4001106 0
       4772 0 0 3210074 4769 0 3224114 0
       9679 0 0 8507153 9622 0 8719630 0
      12355 0 0 10212352 12251 0 10288762 0
       2975 0 0 1457118 2946 0 1466755 0
       4397 0 0 3051610 4377 0 3056513 0
       4782 0 0 3405659 4806 0 3501414 0
       9202 0 0 7891629 9204 0 8080658 0

The catalyst shows to output errors when this is happening on the port 
that FreeBSD is connected to.


Recovery measures that I attempted and that failed to resolve the 
sutuation (one step at a time):

- changed the patch cable from catalyst
- changed the onboard port from 1 to 0
- started to suspect the onboard ethernet controller, added the Intel 
Pro/1000 MT external adapter via the riser card, error rate migrated 
into the dev.em.0.mac_stats.missed_packets counter, sometimes triggering 
the dev.em.0.mac_stats.recv_no_buff:

dev.em.0.mac_stats.recv_no_buff: 9424
dev.em.0.mac_stats.missed_packets: 1853592

- added the iflib/netmap tuning:

net.isr.numthreads="2"
net.isr.maxthreads="2"

dev.em.0.iflib.rx_budget="65535"
dev.em.0.iflib.override_nrxds="4096"
dev.em.0.iflib.override_ntxds="4096"
dev.em.0.iflib.disable_msix="0"

- added the interrupt moderation

dev.em.0.rx_int_delay="200"
dev.em.0.tx_int_delay="200"
dev.em.0.rx_abs_int_delay="4000"
dev.em.0.tx_abs_int_delay="4000"

- tried to play with the kern.eventtimer

kern.eventtimer.periodic="1"

- compiled out the em(4) driver from the kernel to dynamically loading 
module

- changed the module from stock one to the one from net/intel-em-kmod 
port (with netmap compiled out. at this point errors even stopped for 
almost a day, and I was quick enough to report this as a regression into 
the FreeBSD bugtracker (I closed the bug as misdiagnosed after realizing 
this didn't help)).

- upgraded the system to the FreeBSD 13.0

I also noticed that there's no correlation between reboots and the error 
flow: sometimes the er    ror counter could start to increase right 
after booting, sometimes several hours could pass.

After realizing there's no options left, we switched to router to the 
neighbor server running FreeBSD 12.1-STABLE (I don't suspect the version 
that much, it was just running it) and at this time the errors stopped 
(however the network adapter there is igb(4)). After removing the load 
from the x3250 we did a full memtest scan, which reported no errors 
during several passes (didn't suspect the memory to be the root cause 
anyway, since the old one was able to build the world successfully, 
which is almost impossible when havingg memory issues).

So - the obvious question is - what can be the cause of such errors ? 
Lack of system memory (the only thing that comes to mind) ?

The memory distribution is like (when idle):

Mem: 38M Active, 751M Inact, 2122M Wired, 988M Free
ARC: 1064M Total, 312M MFU, 426M MRU, 575K Anon, 32M Header, 275M Other
      559M Compressed, 1750M Uncompressed, 3,13:1 Ratio
Swap: 2048M Total, 2048M Free

But when loaded, there's almost no free memory. However, I've checked 
the netstat -m, and it reports to mbums requests were denied. CPU isn't 
loaded like at all during the peak input rate, or during the momemnts of 
time when the errors starts to stack.

Thanks.

Eugene.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?480b9b6c-33d2-4eb3-c48c-aeee8ded7751>