Date: Sat, 24 Sep 2022 17:30:16 -0700 From: John Fieber <jrf@ursamaris.org> To: mike tancsa <mike@sentex.net> Cc: "Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>, Jim King <jim@jimking.net>, "stable@freebsd.org" <stable@freebsd.org>, "kbowling@FreeBSD.org" <kbowling@FreeBSD.org> Subject: Re: igc problems with heavy traffic (update) Message-ID: <7DA72BB5-F4F1-4AF8-AD1C-CF68908CF723@ursamaris.org> In-Reply-To: <c1b4194f-ea8e-26ef-d923-d4344ca239c1@sentex.net> References: <fc256428-3ff1-68ba-cfcc-a00ca427e85b@jimking.net> <59b9cec0-d8c2-ce72-b5e9-99d1a1e807f8@sentex.net> <e714cd76-0aaa-3ea0-3c31-5e61badffa18@sentex.net> <86995d10-af63-d053-972e-dd233029f3bf@jimking.net> <3d874f65-8ce2-8f06-f19a-14cd550166e3@sentex.net> <a8192d60-2970-edb5-ce1a-c17ea875bf07@jimking.net> <fd1e825b-c306-64b1-f9ef-fec0344a9c95@sentex.net> <a4ddc96a-3dd5-4fee-8003-05f228d10858@jimking.net> <MW4PR11MB5890493674ADD1757BB47075D0659@MW4PR11MB5890.namprd11.prod.outlook.com> <a9935ba0-9cb2-5a41-ca73-b6962fef5e4d@sentex.net> <879b9239-2b9a-f0ae-4173-4a226c84cd85@sentex.net> <c1b4194f-ea8e-26ef-d923-d4344ca239c1@sentex.net>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Sep 14, 2022, at 8:03 AM, mike tancsa <mike@sentex.net> wrote: >=20 > OK, an update hence the top post. I got a new pair of boxes which use = a different Jasper Lake chipset and have i226-V vs the i225 of the = previous box. >=20 > dev.igc.0.%parent: pci2 > dev.igc.0.%pnpinfo: vendor=3D0x8086 device=3D0x125c subvendor=3D0x8086 = subdevice=3D0x0000 class=3D0x020000 > dev.igc.0.%location: slot=3D0 function=3D0 dbsf=3Dpci0:2:0:0 = handle=3D\_SB_.PC00.RP05.PXSX > dev.igc.0.%driver: igc > dev.igc.0.%desc: Intel(R) Ethernet Controller I226-V > dev.igc.%parent: >=20 > WIth a default RELENG_13, out of the box with no tweaks, I am NOT able = to cause the transmitting nic to bounce with heave traffic. I used the = same test script (a constant stream of iperf3 alternating in direction) = maxing out the NIC's bandwidth and all seems fine running the test for = some 18hrs. Maybe something different about the i225 version of this = NIC that needs some different driver defaults ? >=20 > ---Mike >=20 I also see this behavior with 13.1-RELEASE-p2 on: CPU: Intel(R) Celeron(R) J4125 CPU @ 2.00GHz (1996.80-MHz K8-class CPU) Origin=3D"GenuineIntel" Id=3D0x706a8 Family=3D0x6 Model=3D0x7a = Stepping=3D8 NIC (x4): dev.igc.0.%parent: pci1 dev.igc.0.%pnpinfo: vendor=3D0x8086 device=3D0x15f3 subvendor=3D0x8086 = subdevice=3D0x0000 class=3D0x020000 dev.igc.0.%location: slot=3D0 function=3D0 dbsf=3Dpci0:1:0:0 = handle=3D\_SB_.PCI0.RP03.PXSX dev.igc.0.%driver: igc dev.igc.0.%desc: Intel(R) Ethernet Controller I225-V Twidding EEE doesn=E2=80=99t seem to affect it, disabling flow control = helps a bit, but not really a meaningful amount. Tests were done through a tp-link TL-SG3210XHP-M2 switch, with the other = party being 13.1-RELEASE-p2 on a 10gb DAC connection (ixl driver). For comparison, loading up a variety of things in bhyve (with pci pass = through of a nic) these all showed the same problem, with the interface = bouncing multiple times inside of a 5-minute iperf3 test, same as the = host: - FreeBSD-13.1-STABLE-amd64-20220923 - OPNsense-22.7 - pfSense-CE 2.7-DEVLOPMENT-latest These, however, offer unflappable performance: - FreeBSD-14.0-CURRENT-amd64-20220923 - vyos-1.4 (for reference, what I mostly use on this hardware, via = bhyve) -john >=20 > On 8/12/2022 11:04 AM, mike tancsa wrote: >>=20 >> On 8/10/2022 3:53 PM, mike tancsa wrote: >>> On 8/10/2022 1:47 PM, Pieper, Jeffrey E wrote: >>>>=20 >>>> You could try disabling EEE (Energy Efficient Ethernet). Something = like: sysctl dev.igc.0.eee_control=3D0. >>>=20 >>>=20 >>> It does not seem to make a difference. If I have the FC as default, = I get the link bounce on the 2.5G xover (cat 6 cable) maybe 2-3 min in = running iper3 tests. However, if I disable all flow control >>>=20 >>> dev.igc.0.fc=3D0 >>> dev.igc.1.fc=3D0 >>> dev.igc.2.fc=3D0 >>> dev.igc.3.fc=3D0 >>>=20 >>> It *seems* to be less frequent but still happens. I ordered a 2.5 G = switch so I can try and at least see which side is dropping the link. = Should have it Friday to continue testing >>>=20 >>=20 >> OK, I repeated the tests with a 2.5G unmanaged switch in between the = two units rather than xover. It looks like its the server that is = sending the majority of the packets that drops the link, not the = receiver. >>=20 >> One other test I did was to up hw.igc.max_interrupt_rate=3D13000 from = the default of 8000. That seems to make the problem MUCH more acute. >>=20 >> Here is the before and after of the link drop. >>=20 >> dev.igc.1.wake: 0 >> dev.igc.1.interrupts.rx_desc_min_thresh: 0 >> -dev.igc.1.interrupts.asserts: 65 >> +dev.igc.1.interrupts.asserts: 4879479 >> dev.igc.1.mac_stats.tso_txd: 0 >> -dev.igc.1.mac_stats.tx_frames_1024_1522: 3 >> -dev.igc.1.mac_stats.tx_frames_512_1023: 1 >> -dev.igc.1.mac_stats.tx_frames_256_511: 2 >> -dev.igc.1.mac_stats.tx_frames_128_255: 15 >> -dev.igc.1.mac_stats.tx_frames_65_127: 2 >> +dev.igc.1.mac_stats.tx_frames_1024_1522: 12973065 >> +dev.igc.1.mac_stats.tx_frames_512_1023: 58 >> +dev.igc.1.mac_stats.tx_frames_256_511: 107 >> +dev.igc.1.mac_stats.tx_frames_128_255: 1215725 >> +dev.igc.1.mac_stats.tx_frames_65_127: 192 >> dev.igc.1.mac_stats.tx_frames_64: 1 >> dev.igc.1.mac_stats.mcast_pkts_txd: 0 >> dev.igc.1.mac_stats.bcast_pkts_txd: 1 >> -dev.igc.1.mac_stats.good_pkts_txd: 24 >> -dev.igc.1.mac_stats.total_pkts_txd: 24 >> -dev.igc.1.mac_stats.good_octets_txd: 7674 >> -dev.igc.1.mac_stats.good_octets_recvd: 6492 >> -dev.igc.1.mac_stats.rx_frames_1024_1522: 2 >> -dev.igc.1.mac_stats.rx_frames_512_1023: 1 >> -dev.igc.1.mac_stats.rx_frames_256_511: 2 >> -dev.igc.1.mac_stats.rx_frames_128_255: 15 >> -dev.igc.1.mac_stats.rx_frames_65_127: 2 >> +dev.igc.1.mac_stats.good_pkts_txd: 14189148 >> +dev.igc.1.mac_stats.total_pkts_txd: 14189148 >> +dev.igc.1.mac_stats.good_octets_txd: 19450753554 >> +dev.igc.1.mac_stats.good_octets_recvd: 14933399426 >> +dev.igc.1.mac_stats.rx_frames_1024_1522: 9823228 >> +dev.igc.1.mac_stats.rx_frames_512_1023: 3 >> +dev.igc.1.mac_stats.rx_frames_256_511: 62 >> +dev.igc.1.mac_stats.rx_frames_128_255: 2365665 >> +dev.igc.1.mac_stats.rx_frames_65_127: 213 >> dev.igc.1.mac_stats.rx_frames_64: 1 >> dev.igc.1.mac_stats.mcast_pkts_recvd: 0 >> dev.igc.1.mac_stats.bcast_pkts_recvd: 0 >> -dev.igc.1.mac_stats.good_pkts_recvd: 23 >> -dev.igc.1.mac_stats.total_pkts_recvd: 23 >> +dev.igc.1.mac_stats.good_pkts_recvd: 12189172 >> +dev.igc.1.mac_stats.total_pkts_recvd: 12189172 >> dev.igc.1.mac_stats.xoff_txd: 0 >> dev.igc.1.mac_stats.xoff_recvd: 0 >> dev.igc.1.mac_stats.xon_txd: 0 >> dev.igc.1.mac_stats.single_coll: 0 >> dev.igc.1.mac_stats.excess_coll: 0 >> dev.igc.1.queue_rx_3.rx_irq: 0 >> -dev.igc.1.queue_rx_3.rxd_tail: 21 >> -dev.igc.1.queue_rx_3.rxd_head: 22 >> +dev.igc.1.queue_rx_3.rxd_tail: 498 >> +dev.igc.1.queue_rx_3.rxd_head: 499 >> dev.igc.1.queue_rx_2.rx_irq: 0 >> dev.igc.1.queue_rx_2.rxd_tail: 128 >> dev.igc.1.queue_rx_2.rxd_head: 0 >> dev.igc.1.queue_rx_0.rxd_tail: 0 >> dev.igc.1.queue_rx_0.rxd_head: 1 >> dev.igc.1.queue_tx_3.tx_irq: 0 >> -dev.igc.1.queue_tx_3.txd_tail: 0 >> -dev.igc.1.queue_tx_3.txd_head: 0 >> +dev.igc.1.queue_tx_3.txd_tail: 746 >> +dev.igc.1.queue_tx_3.txd_head: 746 >> dev.igc.1.queue_tx_2.tx_irq: 0 >> -dev.igc.1.queue_tx_2.txd_tail: 0 >> -dev.igc.1.queue_tx_2.txd_head: 0 >> +dev.igc.1.queue_tx_2.txd_tail: 186 >> +dev.igc.1.queue_tx_2.txd_head: 186 >> dev.igc.1.queue_tx_1.tx_irq: 0 >> -dev.igc.1.queue_tx_1.txd_tail: 0 >> -dev.igc.1.queue_tx_1.txd_head: 0 >> +dev.igc.1.queue_tx_1.txd_tail: 520 >> +dev.igc.1.queue_tx_1.txd_head: 520 >> dev.igc.1.queue_tx_0.tx_irq: 0 >> -dev.igc.1.queue_tx_0.txd_tail: 45 >> -dev.igc.1.queue_tx_0.txd_head: 45 >> +dev.igc.1.queue_tx_0.txd_tail: 777 >> +dev.igc.1.queue_tx_0.txd_head: 777 >> dev.igc.1.fc_low_water: 32752 >> dev.igc.1.fc_high_water: 32768 >> dev.igc.1.rx_control: 71335938 >> dev.igc.1.device_control: 404489793 >> dev.igc.1.watchdog_timeouts: 0 >> dev.igc.1.rx_overruns: 0 >> -dev.igc.1.link_irq: 2 >> +dev.igc.1.link_irq: 4 >> dev.igc.1.dropped: 0 >> dev.igc.1.eee_control: 0 >> dev.igc.1.itr: 488 >> dev.igc.1.nvm: -1 >> dev.igc.1.iflib.rxq3.rxq_fl0.buf_size: 2048 >> dev.igc.1.iflib.rxq3.rxq_fl0.credits: 1023 >> -dev.igc.1.iflib.rxq3.rxq_fl0.cidx: 22 >> -dev.igc.1.iflib.rxq3.rxq_fl0.pidx: 21 >> +dev.igc.1.iflib.rxq3.rxq_fl0.cidx: 499 >> +dev.igc.1.iflib.rxq3.rxq_fl0.pidx: 498 >> dev.igc.1.iflib.rxq3.cpu: 3 >> dev.igc.1.iflib.rxq2.rxq_fl0.buf_size: 2048 >> dev.igc.1.iflib.rxq2.rxq_fl0.credits: 128 >> dev.igc.1.iflib.txq3.r_abdications: 0 >> dev.igc.1.iflib.txq3.r_restarts: 0 >> dev.igc.1.iflib.txq3.r_stalls: 0 >> -dev.igc.1.iflib.txq3.r_starts: 0 >> +dev.igc.1.iflib.txq3.r_starts: 6175093 >> dev.igc.1.iflib.txq3.r_drops: 0 >> -dev.igc.1.iflib.txq3.r_enqueues: 0 >> -dev.igc.1.iflib.txq3.ring_state: pidx_head: 0000 pidx_tail: 0000 = cidx: 0000 state: IDLE >> -dev.igc.1.iflib.txq3.txq_cleaned: 0 >> -dev.igc.1.iflib.txq3.txq_processed: 0 >> -dev.igc.1.iflib.txq3.txq_in_use: 0 >> -dev.igc.1.iflib.txq3.txq_cidx_processed: 0 >> -dev.igc.1.iflib.txq3.txq_cidx: 0 >> -dev.igc.1.iflib.txq3.txq_pidx: 0 >> +dev.igc.1.iflib.txq3.r_enqueues: 6175093 >> +dev.igc.1.iflib.txq3.ring_state: pidx_head: 0373 pidx_tail: 0373 = cidx: 0373 state: IDLE >> +dev.igc.1.iflib.txq3.txq_cleaned: 12350144 >> +dev.igc.1.iflib.txq3.txq_processed: 12350184 >> +dev.igc.1.iflib.txq3.txq_in_use: 42 >> +dev.igc.1.iflib.txq3.txq_cidx_processed: 744 >> +dev.igc.1.iflib.txq3.txq_cidx: 704 >> +dev.igc.1.iflib.txq3.txq_pidx: 746 >> dev.igc.1.iflib.txq3.no_tx_dma_setup: 0 >> dev.igc.1.iflib.txq3.txd_encap_efbig: 0 >> dev.igc.1.iflib.txq3.tx_map_failed: 0 >> dev.igc.1.iflib.txq2.r_abdications: 0 >> dev.igc.1.iflib.txq2.r_restarts: 0 >> dev.igc.1.iflib.txq2.r_stalls: 0 >> -dev.igc.1.iflib.txq2.r_starts: 0 >> +dev.igc.1.iflib.txq2.r_starts: 3421789 >> dev.igc.1.iflib.txq2.r_drops: 0 >> -dev.igc.1.iflib.txq2.r_enqueues: 0 >> -dev.igc.1.iflib.txq2.ring_state: pidx_head: 0000 pidx_tail: 0000 = cidx: 0000 state: IDLE >> -dev.igc.1.iflib.txq2.txq_cleaned: 0 >> -dev.igc.1.iflib.txq2.txq_processed: 0 >> -dev.igc.1.iflib.txq2.txq_in_use: 0 >> -dev.igc.1.iflib.txq2.txq_cidx_processed: 0 >> -dev.igc.1.iflib.txq2.txq_cidx: 0 >> -dev.igc.1.iflib.txq2.txq_pidx: 0 >> +dev.igc.1.iflib.txq2.r_enqueues: 3421789 >> +dev.igc.1.iflib.txq2.ring_state: pidx_head: 1629 pidx_tail: 1629 = cidx: 1629 state: IDLE >> +dev.igc.1.iflib.txq2.txq_cleaned: 6843536 >> +dev.igc.1.iflib.txq2.txq_processed: 6843576 >> +dev.igc.1.iflib.txq2.txq_in_use: 42 >> +dev.igc.1.iflib.txq2.txq_cidx_processed: 184 >> +dev.igc.1.iflib.txq2.txq_cidx: 144 >> +dev.igc.1.iflib.txq2.txq_pidx: 186 >> dev.igc.1.iflib.txq2.no_tx_dma_setup: 0 >> dev.igc.1.iflib.txq2.txd_encap_efbig: 0 >> dev.igc.1.iflib.txq2.tx_map_failed: 0 >> dev.igc.1.iflib.txq1.r_abdications: 0 >> dev.igc.1.iflib.txq1.r_restarts: 0 >> dev.igc.1.iflib.txq1.r_stalls: 0 >> -dev.igc.1.iflib.txq1.r_starts: 0 >> +dev.igc.1.iflib.txq1.r_starts: 2734852 >> dev.igc.1.iflib.txq1.r_drops: 0 >> -dev.igc.1.iflib.txq1.r_enqueues: 0 >> -dev.igc.1.iflib.txq1.ring_state: pidx_head: 0000 pidx_tail: 0000 = cidx: 0000 state: IDLE >> -dev.igc.1.iflib.txq1.txq_cleaned: 0 >> -dev.igc.1.iflib.txq1.txq_processed: 0 >> -dev.igc.1.iflib.txq1.txq_in_use: 0 >> -dev.igc.1.iflib.txq1.txq_cidx_processed: 0 >> -dev.igc.1.iflib.txq1.txq_cidx: 0 >> -dev.igc.1.iflib.txq1.txq_pidx: 0 >> +dev.igc.1.iflib.txq1.r_enqueues: 2734852 >> +dev.igc.1.iflib.txq1.ring_state: pidx_head: 0772 pidx_tail: 0772 = cidx: 0772 state: IDLE >> +dev.igc.1.iflib.txq1.txq_cleaned: 5469662 >> +dev.igc.1.iflib.txq1.txq_processed: 5469702 >> +dev.igc.1.iflib.txq1.txq_in_use: 42 >> +dev.igc.1.iflib.txq1.txq_cidx_processed: 518 >> +dev.igc.1.iflib.txq1.txq_cidx: 478 >> +dev.igc.1.iflib.txq1.txq_pidx: 520 >> dev.igc.1.iflib.txq1.no_tx_dma_setup: 0 >> dev.igc.1.iflib.txq1.txd_encap_efbig: 0 >> dev.igc.1.iflib.txq1.tx_map_failed: 0 >> dev.igc.1.iflib.txq0.r_abdications: 0 >> dev.igc.1.iflib.txq0.r_restarts: 0 >> dev.igc.1.iflib.txq0.r_stalls: 0 >> -dev.igc.1.iflib.txq0.r_starts: 24 >> +dev.igc.1.iflib.txq0.r_starts: 1857414 >> dev.igc.1.iflib.txq0.r_drops: 0 >> -dev.igc.1.iflib.txq0.r_enqueues: 24 >> -dev.igc.1.iflib.txq0.ring_state: pidx_head: 0024 pidx_tail: 0024 = cidx: 0024 state: IDLE >> -dev.igc.1.iflib.txq0.txq_cleaned: 3 >> -dev.igc.1.iflib.txq0.txq_processed: 43 >> +dev.igc.1.iflib.txq0.r_enqueues: 1857414 >> +dev.igc.1.iflib.txq0.ring_state: pidx_head: 1926 pidx_tail: 1926 = cidx: 1926 state: IDLE >> +dev.igc.1.iflib.txq0.txq_cleaned: 3714783 >> +dev.igc.1.iflib.txq0.txq_processed: 3714823 >> dev.igc.1.iflib.txq0.txq_in_use: 42 >> -dev.igc.1.iflib.txq0.txq_cidx_processed: 43 >> -dev.igc.1.iflib.txq0.txq_cidx: 3 >> -dev.igc.1.iflib.txq0.txq_pidx: 45 >> +dev.igc.1.iflib.txq0.txq_cidx_processed: 775 >> +dev.igc.1.iflib.txq0.txq_cidx: 735 >> +dev.igc.1.iflib.txq0.txq_pidx: 777 >> dev.igc.1.iflib.txq0.no_tx_dma_setup: 0 >> dev.igc.1.iflib.txq0.txd_encap_efbig: 0 >> dev.igc.1.iflib.txq0.tx_map_failed: 0 >> dev.igc.1.%desc: Intel(R) Ethernet Controller I225-V >>=20 >> Interface is RUNNING and ACTIVE >> igc1: TX Queue 0 ------ >> igc1: hw tdh =3D 777, hw tdt =3D 777 >> igc1: TX Queue 1 ------ >> igc1: hw tdh =3D 520, hw tdt =3D 520 >> igc1: TX Queue 2 ------ >> igc1: hw tdh =3D 186, hw tdt =3D 186 >> igc1: TX Queue 3 ------ >> igc1: hw tdh =3D 746, hw tdt =3D 746 >> igc1: RX Queue 0 ------ >> igc1: hw rdh =3D 1, hw rdt =3D 0 >> igc1: RX Queue 1 ------ >> igc1: hw rdh =3D 0, hw rdt =3D 128 >> igc1: RX Queue 2 ------ >> igc1: hw rdh =3D 0, hw rdt =3D 128 >> igc1: RX Queue 3 ------ >> igc1: hw rdh =3D 499, hw rdt =3D 498 >>=20 >>=20 >>=20 >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7DA72BB5-F4F1-4AF8-AD1C-CF68908CF723>