From nobody Thu May 23 09:12:16 2024 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VlMrK3fMwz5LHCC for ; Thu, 23 May 2024 09:12:17 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VlMrK1KVNz4lFq for ; Thu, 23 May 2024 09:12:17 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1716455537; a=rsa-sha256; cv=none; b=J2ftrFcoJyzONMTISP77Ut5rhpnp/Cx7Vo/u0Rl164n/a+q6xib0X/S2VZSPLGIznqe6ty FhbXtmrE2Y1RhJiXa+yIhDUup5FD4mF9lLA+ZLoF2XuhcFO0UtK9fgwwTqsJchZtu6gnxL 3JyXY4Xo8YtCvhzgmx2ibiDMrPn7UQP+pmp39eEMaL0PjRvphEpp0Kt5GZyoMTId3No3tS K1+uQlPgrKptPFBVZsuWhoKvNXd3XMTgjGa9yAmjxad0htzmWfWQSufHkdDkgWilm25L// xzIcLG9Fa4nsh3ft6sDXDIVGCH7cP/IXDCyz40JAY6GB5fS6VBhA87BHz+sTUQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1716455537; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=hskOxizGAtU1FEvHGA8CDwzIr8zaAAp4IC2Tp6hNS8A=; b=ifLIDgWigB2sflsGCG1g7OA6MjgfvEkjBPoMRZ8XyJPM/FXLkxM3a/RzECLGvHaOhLNvFq AmlEYdZ1EpfODZklUDSK81JsIJ9kZRele1BhvSrqm1aGFT9aIJCXIa5+iaJsXBHHLCWQoj WH+n1v3g5/exKGHBIWIVI4lKE/+G+r8ujbZcGGMT8IK/BlczEMV9eXgBWwsNyEaZEqOdqN ce4GzkMvNPJ/V7L3nPsnSAUOjTq8OgNTgkQCtVYgmzH6O17XbskLpgLuPcZ0B8QgGgonDI DT3YnpsGVA5XwBDk1l7zIL/1rnFWf9QIU1eeH4JzpOyY6lD0iPWBkiQbnASfUg== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4VlMrK0xpzzkcH for ; Thu, 23 May 2024 09:12:17 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 44N9CHi3075645 for ; Thu, 23 May 2024 09:12:17 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 44N9CHbd075644 for bugs@FreeBSD.org; Thu, 23 May 2024 09:12:17 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 279245] igc(4) I226 (and I225) hangups Date: Thu, 23 May 2024 09:12:16 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: freebsd_email@congenio.de X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D279245 Bug ID: 279245 Summary: igc(4) I226 (and I225) hangups Product: Base System Version: 13.2-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: freebsd_email@congenio.de When using an I226 under OpnSense (FreeBSD 13.2-RELEASE kernel - I also tri= ed FreeBSD 14.0-RELEASE), I experience connection hangups about once per day u= nder no specific circumstances (maximum was 3 times within one hour, I also had = none in three days). This problem manifests in a dead connection (no packets are received, note = are sent), but the low-level counters (dev.igc.0.mac_stats) still increase. The conditon can be cleard up by bringing the interface down and up again o= r by shortly disconnecting the cable. There are reports on this and other related problems all over the internet = for different OSes, see: Windows: https://forums.evga.com/PSA-Intel-I226V-25GbE-on-Raptor-Lake-Motherboards-H= as-a-Connection-Drop-Issue-No-Fix-m3595279.aspx OpnSense (FreeBSD): https://forum.opnsense.org/index.php?topic=3D40404.msg199288#msg199288 pfSense (FreeBSD): https://forum.netgate.com/topic/181571/chinese-i226-v-on-23-05-1-problems My specific variant is an I226-V, rev.4, built into a Minisforum MS-01: igc0@pci0:87:0:0: class=3D0x020000 rev=3D0x04 hdr=3D0x00 vendor=3D0x8= 086 device=3D0x125c subvendor=3D0x8086 subdevice=3D0x0000 vendor =3D 'Intel Corporation' device =3D 'Ethernet Controller I226-V' class =3D network subclass =3D ethernet However, there are reports of the I226-LM connected to the same machine sho= wing the same behaviour, see: https://forum.opnsense.org/index.php?topic=3D40556 igc1@pci0:88:0:0: class=3D0x020000 rev=3D0x04 hdr=3D0x00 vendor=3D0x8= 086 device=3D0x125b subvendor=3D0x8086 subdevice=3D0x0000 vendor =3D 'Intel Corporation' device =3D 'Ethernet Controller I226-LM' class =3D network subclass =3D ethernet This seems to indicate that at least the I226 family (which is a successor = to the problem-ridden I225 using the same driver module) is affected by this problem. I tried all possible settings I could think of to make this go away, like reducing the speed from 2.5 to 1 Gbps, disabling EEE (which is off by defau= lt anyway) to no avail. Interestingly, the Minisforum-MS01 has gained much interest in the last few months and there was a specific review on Youtube were the creator states i= n a comment that he is not seeing this problem (https://www.youtube.com/watch?v=3D_wgX1sDab-M). However, he uses OpnSense = under a Proxmox hypervisor, thus using the Linux driver modules (OpnSense itself = uses the virtualized virtio NICs). This and the reports of gamers stating they had "micro-hangs" manifesting as short lags in online games got me thinking. So I compared the Linux and FreeBSD drivers and found, that the Linux driver has a specific routine to catch, protocol and clear "TX hang" conditions, s= ee from line 3150 here: https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/ig= c/igc_main.c, which reads: if (test_bit(IGC_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) { struct igc_hw *hw =3D &adapter->hw; /* Detect a transmit hang in hardware, this serializes the * check with the clearing of time_stamp and movement of i */ clear_bit(IGC_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags); if (tx_buffer->next_to_watch && time_after(jiffies, tx_buffer->time_stamp + (adapter->tx_timeout_factor * HZ)) && !(rd32(IGC_STATUS) & IGC_STATUS_TXOFF) && (rd32(IGC_TDH(tx_ring->reg_idx)) !=3D readl(tx_ring->ta= il)) && !tx_ring->oper_gate_closed) { /* detected Tx unit hang */ netdev_err(tx_ring->netdev, "Detected Tx Unit Hang\n" " Tx Queue <%d>\n" " TDH <%x>\n" " TDT <%x>\n" " next_to_use <%x>\n" " next_to_clean <%x>\n" "buffer_info[next_to_clean]\n" " time_stamp <%lx>\n" " next_to_watch <%p>\n" " jiffies <%lx>\n" " desc.status <%x>\n", tx_ring->queue_index, rd32(IGC_TDH(tx_ring->reg_idx)), readl(tx_ring->tail), tx_ring->next_to_use, tx_ring->next_to_clean, tx_buffer->time_stamp, tx_buffer->next_to_watch, jiffies, tx_buffer->next_to_watch->wb.status); netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index); /* we are about to reset, no point in enabling stuf= f */ return true; } } There is also a routine to reset the adapter: /** * igc_tx_timeout - Respond to a Tx Hang * @netdev: network interface device structure * @txqueue: queue number that timed out **/ static void igc_tx_timeout(struct net_device *netdev, unsigned int __always_unused txqueue) { struct igc_adapter *adapter =3D netdev_priv(netdev); struct igc_hw *hw =3D &adapter->hw; /* Do the reset outside of interrupt context */ adapter->tx_timeout_count++; schedule_work(&adapter->reset_task); wr32(IGC_EICS, (adapter->eims_enable_mask & ~adapter->eims_other)); } I did not see anything to this extent in the FreeBSD driver igc module. Intel themselves do not offer an OEM driver for FreeBSD in their Intel Netw= ork Connections 29.1 package. So, my theory is that there is a hardware ideosyncrasy in this Intel adapter family which causes packet flow to stop sometimes. This is handled in the Linux driver module by testing if no packets are processed for a short period. That detection and handling would not be there if there was no problem, so = we can take this for a fact. I suspect that the same handling is contained in the Windows drivers, too - which I cannot ascertain because I cannot look at the source code. However, this would be in line with the observed "micro-hangs" under Windows from other users. Alas, under FreeBSD, there is no handling of this condition which might exp= lain the total packet loss after it occurs. If it were fixed in FreeBSD, it would be a great benefit for applications l= ike pfSense and OpnSense since now, these adapters are essentially unusable. A potential fix would still produce "micro-hangs" once in a while, however = this is far better than losing the connection completely. --=20 You are receiving this mail because: You are the assignee for the bug.=