Date: Sat, 25 Dec 2021 22:15:55 +0500 From: "Eugene M. Zheganin" <eugene@zhegan.in> To: stable@freebsd.org Subject: FreeBSD and interface errors/drops Message-ID: <480b9b6c-33d2-4eb3-c48c-aeee8ded7751@zhegan.in>
next in thread | raw e-mail | index | archive | help
Hello, I have a FreeBSD used as an office router for an organisation. It was installed years ago and its configuration is: - IBM system x3250m2 - 4 gigs RAM - Intel Xeon E3110 - two WDC WD5000AADS-0 in a zfs mirror. - bge(4) NetXtreme BCM5722 Gigabit Ethernet PCI Express, initially present - em(4) 82572EI Gigabit Ethernet Controller (Copper) adapter, added later This server is connected with both WAN and LAN using one bge(4) link to a Cisco catalyst 2960, comprising several vlans. After several years of running, starting from 10.x, when 12.2 was already installed for quite some time, I started having a huge number of input errors on an interface, that were increasing the dev.bge.0 counters like dev.bge.0.stats.InputDiscards Error input rate was changing from 0 (most of the time) to 6K-80K per second. The observed interface input rate was floating around 300 Mbps during it's peak. Sample of netstat -I bge0 1 showing the moment when there's bunch of errors and the amount of traffic: input bge0 output packets errs idrops bytes packets errs bytes colls 20695 701 0 19244062 18182 0 17059035 0 929 61003 0 938482 438 0 118494 0 1383 44667 0 1094321 537 0 196633 0 11726 1 0 8904828 11560 0 9012153 0 6116 0 0 3991680 6051 0 4001106 0 4772 0 0 3210074 4769 0 3224114 0 9679 0 0 8507153 9622 0 8719630 0 12355 0 0 10212352 12251 0 10288762 0 2975 0 0 1457118 2946 0 1466755 0 4397 0 0 3051610 4377 0 3056513 0 4782 0 0 3405659 4806 0 3501414 0 9202 0 0 7891629 9204 0 8080658 0 The catalyst shows to output errors when this is happening on the port that FreeBSD is connected to. Recovery measures that I attempted and that failed to resolve the sutuation (one step at a time): - changed the patch cable from catalyst - changed the onboard port from 1 to 0 - started to suspect the onboard ethernet controller, added the Intel Pro/1000 MT external adapter via the riser card, error rate migrated into the dev.em.0.mac_stats.missed_packets counter, sometimes triggering the dev.em.0.mac_stats.recv_no_buff: dev.em.0.mac_stats.recv_no_buff: 9424 dev.em.0.mac_stats.missed_packets: 1853592 - added the iflib/netmap tuning: net.isr.numthreads="2" net.isr.maxthreads="2" dev.em.0.iflib.rx_budget="65535" dev.em.0.iflib.override_nrxds="4096" dev.em.0.iflib.override_ntxds="4096" dev.em.0.iflib.disable_msix="0" - added the interrupt moderation dev.em.0.rx_int_delay="200" dev.em.0.tx_int_delay="200" dev.em.0.rx_abs_int_delay="4000" dev.em.0.tx_abs_int_delay="4000" - tried to play with the kern.eventtimer kern.eventtimer.periodic="1" - compiled out the em(4) driver from the kernel to dynamically loading module - changed the module from stock one to the one from net/intel-em-kmod port (with netmap compiled out. at this point errors even stopped for almost a day, and I was quick enough to report this as a regression into the FreeBSD bugtracker (I closed the bug as misdiagnosed after realizing this didn't help)). - upgraded the system to the FreeBSD 13.0 I also noticed that there's no correlation between reboots and the error flow: sometimes the er ror counter could start to increase right after booting, sometimes several hours could pass. After realizing there's no options left, we switched to router to the neighbor server running FreeBSD 12.1-STABLE (I don't suspect the version that much, it was just running it) and at this time the errors stopped (however the network adapter there is igb(4)). After removing the load from the x3250 we did a full memtest scan, which reported no errors during several passes (didn't suspect the memory to be the root cause anyway, since the old one was able to build the world successfully, which is almost impossible when havingg memory issues). So - the obvious question is - what can be the cause of such errors ? Lack of system memory (the only thing that comes to mind) ? The memory distribution is like (when idle): Mem: 38M Active, 751M Inact, 2122M Wired, 988M Free ARC: 1064M Total, 312M MFU, 426M MRU, 575K Anon, 32M Header, 275M Other 559M Compressed, 1750M Uncompressed, 3,13:1 Ratio Swap: 2048M Total, 2048M Free But when loaded, there's almost no free memory. However, I've checked the netstat -m, and it reports to mbums requests were denied. CPU isn't loaded like at all during the peak input rate, or during the momemnts of time when the errors starts to stack. Thanks. Eugene.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?480b9b6c-33d2-4eb3-c48c-aeee8ded7751>