From owner-freebsd-bugs@freebsd.org Tue Apr 28 16:34:11 2020 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3EA702BF307 for ; Tue, 28 Apr 2020 16:34:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 49BS0315S4z3yvl for ; Tue, 28 Apr 2020 16:34:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 257332BF306; Tue, 28 Apr 2020 16:34:11 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 253352BF304 for ; Tue, 28 Apr 2020 16:34:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49BS030TJCz3yvk for ; Tue, 28 Apr 2020 16:34:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 0BBCB27257 for ; Tue, 28 Apr 2020 16:34:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 03SGYB1n030370 for ; Tue, 28 Apr 2020 16:34:11 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 03SGYBU0030369 for bugs@FreeBSD.org; Tue, 28 Apr 2020 16:34:11 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 246003] em(4) Intel I219-V6 on NUC8i5BEH randomly loses carrier or fails over to 100Mbit Date: Tue, 28 Apr 2020 16:34:11 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: kumba@gentoo.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Apr 2020 16:34:11 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D246003 Bug ID: 246003 Summary: em(4) Intel I219-V6 on NUC8i5BEH randomly loses carrier or fails over to 100Mbit Product: Base System Version: 12.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: kumba@gentoo.org Running an Intel NUC here (NUC8i5BEH) which has an Intel I219 V6 chip in it that the em(4) driver handles. Very sporadically, the machine will either outright lose network connectivity, or start to experience significantly-increased latency responding to network traffic. Almost like= how a traffic jam might start, is how it feels. The machine is a Squid proxy server for my home network, and it seems acces= sing websites with large amounts of resources to fetch can more easily trigger whatever this bug is, though I have also triggered the bug with something as simple as editing a text file in nano over SSH. When the bug happens, I sometimes see my switch, a Netgear GS324T (S350 series), change the port over from 1Gbps to 100Mbps (green to orange on the LED). If I ping the device from another machine on the network, the ping is either lost, the host is marked as "down", or the ping returns upwards of a= few thousand milliseconds later. Recovering from the issue is usually only done by rebooting the machine.=20 Sometimes, if you just wait several minutes, the machine will eventually respond and behave normally. This to me feels like a buffer being flooded = too quickly. I have been using jumbo frames w/ an MTU of 9000. As a test, I have lowered that down to 1500 to see if the issues remain. It feels like this MIGHT be tied to Bug #218894, per the last comment in 2018. If it is, lowering the = MTU to 1500, or staying under 6k/pkt might avoid the issue, as it smells like a buffer in em(4) is not sized correctly on I219 chips to handle 9k/pkt jumbo frames. I am experiencing this issue on both the base em(4) driver (7.6.1-k) as wel= l as the latest intel-em-kmod driver from ports (7.7.5). Some technical info (IP/DNS info removed): dmesg, shwoing the device going up/down a few times, including when I tried unplugging its cable from the switch: ---<>--- Copyright (c) 1992-2019 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.1-RELEASE-p4 CUSTOM-12_1 amd64 FreeBSD clang version 8.0.1 (tags/RELEASE_801/final 366581) (based on LLVM 8.0.1) VT(efifb): resolution 1024x768 module zfsctrl already present! module_register: cannot register pci/em from kernel; already loaded from if_em_updated.ko Module pci/em failed to register: 17 CPU: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz (2304.11-MHz K8-class CPU) Origin=3D"GenuineIntel" Id=3D0x806ea Family=3D0x6 Model=3D0x8e Steppi= ng=3D10 =20 Features=3D0xbfebfbff =20 Features2=3D0x7ffafbbf AMD Features=3D0x2c100800 AMD Features2=3D0x121 Structured Extended Features=3D0x29c67af Structured Extended Features3=3D0x9c002400 XSAVE Features=3D0xf VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory =3D 17179869184 (16384 MB) avail memory =3D 16487288832 (15723 MB) CPU microcode: updated from 0xc6 to 0xca Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) random: unblocking device. ioapic0 irqs 0-119 on motherboard Launching APs: 3 2 1 Timecounter "TSC-low" frequency 1152052507 Hz quality 1000 random: entropy device external interface module_register_init: MOD_LOAD (vesa, 0xffffffff80b2e120, 0) error 19 kbd0 at kbdmux0 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" nexus0 efirtc0: on motherboard efirtc0: registered as a time-of-day clock, resolution 1.000000s cryptosoft0: on motherboard aesni0: on motherboard acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 24000000 Hz quality 950 Event timer "HPET" frequency 24000000 Hz quality 550 Event timer "HPET1" frequency 24000000 Hz quality 440 Event timer "HPET2" frequency 24000000 Hz quality 440 Event timer "HPET3" frequency 24000000 Hz quality 440 Event timer "HPET4" frequency 24000000 Hz quality 440 attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0 acpi_ec0: port 0x62,0x66 on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 vgapci0: port 0x3000-0x303f mem 0xbf000000-0xbfffffff,0x80000000-0x8fffffff at device 2.0 on pci0 vgapci0: Boot video device xhci0: mem 0x404a000000-0x404a00ffff at device 20.0 on pci0 xhci0: 32 bytes context size, 64-bit DMA usbus0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci0: at device 20.2 (no driver attached) pci0: at device 22.0 (no driver attached) ahci0: port 0x3090-0x3097,0x3080-0x3083,0x3060-0x307f mem 0xc0120000-0xc0121fff,0xc0123000-0xc01230ff,0xc0122000-0xc01227ff at de= vice 23.0 on pci0 ahci0: AHCI v1.31 with 1 6Gbps ports, Port Multiplier not supported ahcich2: at channel 2 on ahci0 pcib1: at device 28.0 on pci0 pci1: on pcib1 pcib2: at device 28.4 on pci0 pcib2: [GIANT-LOCKED] pcib3: at device 29.0 on pci0 pci2: on pcib3 nvme0: mem 0xc0000000-0xc0003fff at device 0.0 on pci2 isab0: at device 31.0 on pci0 isa0: on isab0 pci0: at device 31.5 (no driver attached) em0: mem 0xc0100000-0xc011ffff= at device 31.6 on pci0 em0: Using an MSI interrupt em0: Ethernet address: 1c:69:7a:08:74:7e acpi_button0: on acpi0 acpi_button1: on acpi0 acpi_tz0: on acpi0 acpi_syscontainer0: on acpi0 acpi_tz1: on acpi0 acpi_tz1: _HOT value is absurd, ignored (-73.1C) atrtc0: at port 0x70 irq 8 on isa0 atrtc0: Warning: Couldn't map I/O. atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 uart0: at port 0x3f8 irq 4 flags 0x10 on isa0 coretemp0: on cpu0 est0: on cpu0 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 10.000 msec acpi_tz1: _TMP value is absurd, ignored (-263.1C) ugen0.1: <0x8086 XHCI root HUB> at usbus0 uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 nvd0: NVMe namespace nvd0: 953869MB (1953525168 512 byte sectors) Trying to mount root from zfs:core/env/fbsd_12.1-20200422 []... Root mount waiting for: usbus0 Root mount waiting for: usbus0 uhub0: 18 ports with 18 removable, self powered ugen0.2: at usbus0 ukbd0 on uhub0 ukbd0: on usbus0 kbd1 at ukbd0 GEOM_ELI: Device nvd0p2.eli created. GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: hardware lo0: link state changed to UP em0: link state changed to UP ums0 on uhub0 ums0: on us= bus0 ums0: 5 buttons and [XYZ] coordinates ID=3D1 ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny, logging disabled em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP em0: link state changed to DOWN em0: link state changed to UP netstat -i shows a handful of Ierrs: Name Mtu Network Address Ipkts Ierrs Idrop Opkts O= errs Coll em0 1500 1c:69:7a:xx:xx:xx 5959360 4 0 5229960 = 0 0 em0 - 192.168.x.0/2 xxxxxx 5896418 - - 5201176 = - - em0 - fe80::%em0/64 fe80::1e69:7aff:f 0 - - 0 = - - em0 - fdxx::xxxx:xx xxxxxx 28862 - - 28369 = - - lo0 16384 lo0 1384 0 0 1384 = 0 0 lo0 - localhost localhost 0 - - 215 = - - lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 = - - lo0 - your-net localhost 42 - - 1384 = - - ipfw0 - ipfw0 0 0 0 0 = 0 0 pciconf: em0@pci0:0:31:6: class=3D0x020000 card=3D0x20748086 chip=3D0x15be808= 6 rev=3D0x30 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D 'Ethernet Connection (6) I219-V' class =3D network subclass =3D ethernet sysctl -a info for em0: # sysctl -a | grep "\.em\." hw.em.max_interrupt_rate: 8000 hw.em.eee_setting: 1 hw.em.rx_process_limit: -1 hw.em.sbp: 1 hw.em.smart_pwr_down: 0 hw.em.rx_abs_int_delay: 66 hw.em.tx_abs_int_delay: 66 hw.em.rx_int_delay: 0 hw.em.tx_int_delay: 66 hw.em.disable_crc_stripping: 0 dev.em.0.wake: 0 dev.em.0.interrupts.rx_overrun: 0 dev.em.0.interrupts.rx_desc_min_thresh: 0 dev.em.0.interrupts.tx_queue_min_thresh: 0 dev.em.0.interrupts.tx_queue_empty: 0 dev.em.0.interrupts.tx_abs_timer: 0 dev.em.0.interrupts.tx_pkt_timer: 0 dev.em.0.interrupts.rx_abs_timer: 0 dev.em.0.interrupts.rx_pkt_timer: 0 dev.em.0.interrupts.asserts: 4163967 dev.em.0.mac_stats.tx_frames_1024_1522: -1 dev.em.0.mac_stats.tx_frames_512_1023: -1 dev.em.0.mac_stats.tx_frames_256_511: -1 dev.em.0.mac_stats.tx_frames_128_255: -1 dev.em.0.mac_stats.tx_frames_65_127: -1 dev.em.0.mac_stats.tx_frames_64: -1 dev.em.0.mac_stats.rx_frames_1024_1522: -1 dev.em.0.mac_stats.rx_frames_512_1023: -1 dev.em.0.mac_stats.rx_frames_256_511: -1 dev.em.0.mac_stats.rx_frames_128_255: -1 dev.em.0.mac_stats.rx_frames_65_127: -1 dev.em.0.mac_stats.rx_frames_64: -1 dev.em.0.mac_stats.tso_ctx_fail: 0 dev.em.0.mac_stats.tso_txd: 0 dev.em.0.mac_stats.mcast_pkts_txd: 24 dev.em.0.mac_stats.bcast_pkts_txd: 124 dev.em.0.mac_stats.good_pkts_txd: 5231594 dev.em.0.mac_stats.total_pkts_txd: 5231594 dev.em.0.mac_stats.good_octets_txd: 6647896611 dev.em.0.mac_stats.good_octets_recvd: 6724467487 dev.em.0.mac_stats.mcast_pkts_recvd: 273 dev.em.0.mac_stats.bcast_pkts_recvd: 32687 dev.em.0.mac_stats.good_pkts_recvd: 5960900 dev.em.0.mac_stats.total_pkts_recvd: 5960904 dev.em.0.mac_stats.xoff_txd: 0 dev.em.0.mac_stats.xoff_recvd: 0 dev.em.0.mac_stats.xon_txd: 0 dev.em.0.mac_stats.xon_recvd: 0 dev.em.0.mac_stats.coll_ext_errs: 0 dev.em.0.mac_stats.alignment_errs: 0 dev.em.0.mac_stats.crc_errs: 0 dev.em.0.mac_stats.recv_errs: 0 dev.em.0.mac_stats.recv_jabber: 0 dev.em.0.mac_stats.recv_oversize: 0 dev.em.0.mac_stats.recv_fragmented: 0 dev.em.0.mac_stats.recv_undersize: 0 dev.em.0.mac_stats.recv_no_buff: 0 dev.em.0.mac_stats.missed_packets: 4 dev.em.0.mac_stats.defer_count: 0 dev.em.0.mac_stats.sequence_errors: 0 dev.em.0.mac_stats.symbol_errors: 0 dev.em.0.mac_stats.collision_count: 0 dev.em.0.mac_stats.late_coll: 0 dev.em.0.mac_stats.multiple_coll: 0 dev.em.0.mac_stats.single_coll: 0 dev.em.0.mac_stats.excess_coll: 0 dev.em.0.queue_rx_0.rx_irq: 0 dev.em.0.queue_rx_0.rxd_tail: 1001 dev.em.0.queue_rx_0.rxd_head: 1003 dev.em.0.queue_tx_0.no_desc_avail: 0 dev.em.0.queue_tx_0.tx_irq: 0 dev.em.0.queue_tx_0.txd_tail: 537 dev.em.0.queue_tx_0.txd_head: 538 dev.em.0.fc_low_water: 20552 dev.em.0.fc_high_water: 23584 dev.em.0.rx_control: 67141634 dev.em.0.device_control: 1573440 dev.em.0.watchdog_timeouts: 0 dev.em.0.rx_overruns: 4 dev.em.0.tx_dma_fail: 0 dev.em.0.dropped: 0 dev.em.0.cluster_alloc_fail: 0 dev.em.0.mbuf_alloc_fail: 0 dev.em.0.link_irq: 0 dev.em.0.eee_control: 1 dev.em.0.rx_processing_limit: -1 dev.em.0.itr: 488 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_int_delay: 66 dev.em.0.rx_int_delay: 0 dev.em.0.fc: 0 dev.em.0.debug: -1 dev.em.0.nvm: -1 dev.em.0.%parent: pci0 dev.em.0.%pnpinfo: vendor=3D0x8086 device=3D0x15be subvendor=3D0x8086 subdevice=3D0x2074 class=3D0x020000 dev.em.0.%location: slot=3D31 function=3D6 dbsf=3Dpci0:0:31:6 handle=3D\_SB= _.PCI0.GLAN dev.em.0.%driver: em dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.7.5 dev.em.%parent: ping output from another device on the network: # ping xxxxxx PING xxxxxx (192.168.x.yyy): 56 data bytes ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down 64 bytes from 192.168.x.yyy: icmp_seq=3D17 ttl=3D64 time=3D4227.244 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D18 ttl=3D64 time=3D3154.970 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D19 ttl=3D64 time=3D2143.813 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D20 ttl=3D64 time=3D1071.436 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D21 ttl=3D64 time=3D1.375 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D79 ttl=3D64 time=3D0.578 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D284 ttl=3D64 time=3D0.958 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D293 ttl=3D64 time=3D0.580 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D334 ttl=3D64 time=3D0.496 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D335 ttl=3D64 time=3D0.454 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D336 ttl=3D64 time=3D0.473 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D337 ttl=3D64 time=3D0.457 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D338 ttl=3D64 time=3D0.459 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D339 ttl=3D64 time=3D0.442 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D340 ttl=3D64 time=3D0.447 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D341 ttl=3D64 time=3D0.452 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D342 ttl=3D64 time=3D0.437 ms 64 bytes from 192.168.x.yyy: icmp_seq=3D343 ttl=3D64 time=3D0.462 ms --=20 You are receiving this mail because: You are the assignee for the bug.=