From owner-freebsd-bugs@FreeBSD.ORG Sun Apr 5 13:28:24 2015 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 487A3267 for ; Sun, 5 Apr 2015 13:28:24 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 28CCBC89 for ; Sun, 5 Apr 2015 13:28:24 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t35DSNbk069006 for ; Sun, 5 Apr 2015 13:28:23 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 199174] em tx and rx hang Date: Sun, 05 Apr 2015 13:28:24 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: david.keller@litchis.fr X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2015 13:28:24 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199174 Bug ID: 199174 Summary: em tx and rx hang Product: Base System Version: 10.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: david.keller@litchis.fr Hi, While sending moderated nfs traffic < 2Mo/s, the interface suddenly stopped transmitting/receiving. However the interface seemed fine: $ ifconfig em0: flags=8843 metric 0 mtu 9000 options=4219b ether 00:25:90:34:5d:44 inet YYYY netmask 0xffffff00 broadcast YYY.255 inet6 fe80::225:90ff:fe34:5d44%em0 prefixlen 64 scopeid 0x1 inet6 XXXX prefixlen 64 autoconf nd6 options=23 media: Ethernet autoselect (1000baseT ) status: active Pinging gateway didn't work: $ ping ZZZZ PING ZZZZ (ZZZZ): 56 data bytes ping: sendto: Host is down ping: sendto: Host is down But driver seemed happy with the card as no particular message was printed. # tcpdump -ni em0 -> No rx traffic, only tx. Printing em driver internal variables was more interesting: $ sysctl dev.em.0.debug=1 Interface is RUNNING and ACTIVE em0: hw tdh = 325, hw tdt = 166 em0: hw rdh = 688, hw rdt = 687 em0: Tx Queue Status = 1 em0: TX descriptors avail = 150 em0: Tx Descriptors avail failure = 0 em0: RX discarded packets = 0 em0: RX Next to Check = 688 em0: RX Next to Refresh = 687 Sending PING request incremented hw tdt as expected. Wondering what would happen when it would reach tdh value, I ping-flooded the gateway. Driver figured out something was going bad and reset the card: #ping -f ZZZZ em0: Watchdog timeout -- resetting em0: Queue(0) tdh = 325, hw tdt = 285 em0: TX(0) desc avail = 31,Next TX to Clean = 316 em0: link state changed to DOWN em0: link state changed to UP Interface is RUNNING and ACTIVE em0: hw tdh = 113, hw tdt = 113 em0: hw rdh = 36, hw rdt = 35 em0: Tx Queue Status = 0 em0: TX descriptors avail = 1024 em0: Tx Descriptors avail failure = 0 em0: RX discarded packets = 0 em0: RX Next to Check = 36 em0: RX Next to Refresh = 35 >From here, the interface was working as usual. $ ping ZZZZ PING ZZZZ (ZZZZ): 56 data bytes 64 bytes from ZZZZ: icmp_seq=0 ttl=255 time=0.241 ms $dmesg FreeBSD 10.1-RELEASE-p6 #0: Tue Feb 24 19:00:21 UTC 2015 [...] em0: port 0xdc00-0xdc1f mem 0xfe9e0000-0xfe9fffff,0xfe9dc000-0xfe9dffff irq 16 at device 0.0 on pci2 em0: Using MSIX interrupts with 3 vectors em0: Ethernet address: 00:25:90:34:5d:44 pcib3: irq 16 at device 28.5 on pci0 pci3: on pcib3 em1: port 0xec00-0xec1f mem 0xfeae0000-0xfeafffff,0xfeadc000-0xfeadffff irq 17 at device 0.0 on pci3 em1: Using MSIX interrupts with 3 vectors em1: Ethernet address: 00:25:90:34:5d:45 $pciconf -elv [...] em0@pci0:2:0:0: class=0x020000 card=0x060a15d9 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet PCI-e errors = Correctable Error Detected Unsupported Request Detected Corrected = Receiver Error Bad TLP Bad DLLP REPLAY_NUM Rollover Replay Timer Timeout Advisory Non-Fatal Error em1@pci0:3:0:0: class=0x020000 card=0x060a15d9 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet PCI-e errors = Correctable Error Detected Unsupported Request Detected Corrected = Receiver Error Bad TLP Bad DLLP Replay Timer Timeout Advisory Non-Fatal Error The port is connected to a GS108 switch. Link was up the whole time and no transmit error has been detected. Motherboard is a Supermicro X7SPA-HF with latest bios. On this board, there is a BMC sharing the em0 port. The BMC was not responding either. Hence my lucky guess would be that it may not be the driver fault as the BMC has suffered too, but the card fault. This is also happening on an OpenBSD em0 with the same motherboard (but not connected to the same switch). -- You are receiving this mail because: You are the assignee for the bug.