Date: Fri, 17 Jul 2020 17:11:49 +1000 From: Aristedes Maniatis <ari@ish.com.au> To: freebsd-stable <freebsd-stable@freebsd.org> Subject: Ethernet interface Watchdog timeout Message-ID: <2931240e-45c2-93e3-4746-48d4f566bd9f@ish.com.au>
next in thread | raw e-mail | index | archive | help
Last night I needed to reboot switches connected to a FreeBSD server. There are two igb interfaces, bound via lagg0 as an LACP pair. Each is connected to a different switch and those switches support mlag (LAG distributed across more than one switch unit). One of the interfaces came back fine when its switch rebooted, but when the second switch was rebooted several hours later the other interface didn't. Both igb0 and igb1 interfaces are on the motherboard itself. This has happened once before, and rebooting the FreeBSD server resolved it. Obviously I'd like to understand the problem better first. Is there more debugging I could collect while the server is in this state? Physically removing the ethernet cable and plugging it back in does not bring the interface up. ifconfig down and up also does not help. What is this watchdog timeout that we are seeing in the logs? Ari # ifconfig igb0 igb0: flags=8c03<UP,BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether ac:1f:6b:00:ea:b2 media: Ethernet autoselect status: no carrier nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> # uname -a FreeBSD lash.internal 12.1-RELEASE-p2 FreeBSD 12.1-RELEASE-p2 GENERIC amd64 # grep igb0 /var/log/messages Jul 8 23:00:43 lash kernel: igb0: Watchdog timeout (TX: 0 desc avail: 42 pidx: 1003) -- resetting Jul 8 23:00:43 lash kernel: igb0: link state changed to DOWN Jul 8 23:00:44 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul 9 00:00:01 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul 9 05:01:12 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul 9 05:06:56 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul 9 14:25:33 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul 9 14:44:30 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting igb0@pci0:1:0:0: class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 10 messages, enabled Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR NS link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1) ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 ac1f6bffff00eab2 ecap 000e[150] = ARI 1 ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled 0 VFs configured out of 8 supported First VF RID Offset 0x0180, VF RID Stride 0x0004 VF Device ID 0x1520 Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304 ecap 0017[1a0] = TPH Requester 1 ecap 0018[1c0] = LTR 1 ecap 000d[1d0] = ACS 1 # dmidecode -t baseboard # dmidecode 3.2 Scanning /dev/mem for entry point. SMBIOS 3.0 present. Handle 0x0002, DMI type 2, 15 bytes Base Board Information Manufacturer: Supermicro Product Name: X10DRW-i Version: 1.02 Serial Number: NM173S002991 Asset Tag: Default string Features: Board is a hosting board Board is replaceable Location In Chassis: Default string Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0 Handle 0x0021, DMI type 41, 11 bytes Onboard Device Reference Designation: ASPEED Video AST2400 Type: Video Status: Enabled Type Instance: 1 Bus Address: 0000:05:00.0 Handle 0x0022, DMI type 41, 11 bytes Onboard Device Reference Designation: Intel Ethernet i350 #1 Type: Ethernet Status: Enabled Type Instance: 1 Bus Address: 0000:01:00.0 Handle 0x0023, DMI type 41, 11 bytes Onboard Device Reference Designation: Intel Ethernet i350 #2 Type: Ethernet Status: Enabled Type Instance: 2 Bus Address: 0000:01:00.1
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2931240e-45c2-93e3-4746-48d4f566bd9f>