From owner-freebsd-stable@freebsd.org Fri Jul 17 07:11:56 2020 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id D98EE35ADE9 for ; Fri, 17 Jul 2020 07:11:56 +0000 (UTC) (envelope-from ari@ish.com.au) Received: from mail.ish.com.au (mail.ish.com.au [203.29.62.212]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4B7MkL3NYrz4f82 for ; Fri, 17 Jul 2020 07:11:54 +0000 (UTC) (envelope-from ari@ish.com.au) Received: from [10.242.2.3] (helo=MacBook-Pro.local) by mail.ish.com.au with esmtpsa (TLS1.3) tls TLS_AES_128_GCM_SHA256 (Exim 4.94 (FreeBSD)) (envelope-from ) id 1jwKX7-000Ka1-LK for freebsd-stable@freebsd.org; Fri, 17 Jul 2020 17:11:49 +1000 From: Aristedes Maniatis Subject: Ethernet interface Watchdog timeout To: freebsd-stable Message-ID: <2931240e-45c2-93e3-4746-48d4f566bd9f@ish.com.au> Date: Fri, 17 Jul 2020 17:11:49 +1000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Thunderbird/79.0 MIME-Version: 1.0 Content-Language: en-AU X-Rspamd-Queue-Id: 4B7MkL3NYrz4f82 X-Spamd-Bar: - X-Spamd-Result: default: False [-1.88 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[ish.com.au:s=mail]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:203.29.62.0/24]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; NEURAL_HAM_LONG(-0.95)[-0.947]; RCPT_COUNT_ONE(0.00)[1]; MANY_INVISIBLE_PARTS(1.00)[10]; NEURAL_HAM_MEDIUM(-0.88)[-0.883]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[ish.com.au:+]; DMARC_POLICY_ALLOW(-0.50)[ish.com.au,quarantine]; NEURAL_HAM_SHORT(-0.05)[-0.049]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:7545, ipnet:203.29.62.0/24, country:AU]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jul 2020 07:11:56 -0000 Last night I needed to reboot switches connected to a FreeBSD server. There are two igb interfaces, bound via lagg0 as an LACP pair. Each is connected to a different switch and those switches support mlag (LAG distributed across more than one switch unit). One of the interfaces came back fine when its switch rebooted, but when the second switch was rebooted several hours later the other interface didn't. Both igb0 and igb1 interfaces are on the motherboard itself. This has happened once before, and rebooting the FreeBSD server resolved it. Obviously I'd like to understand the problem better first. Is there more debugging I could collect while the server is in this state? Physically removing the ethernet cable and plugging it back in does not bring the interface up. ifconfig down and up also does not help. What is this watchdog timeout that we are seeing in the logs? Ari # ifconfig igb0 igb0: flags=8c03 metric 0 mtu 1500     options=e507bb     ether ac:1f:6b:00:ea:b2     media: Ethernet autoselect     status: no carrier     nd6 options=29 # uname -a FreeBSD lash.internal 12.1-RELEASE-p2 FreeBSD 12.1-RELEASE-p2 GENERIC  amd64 # grep igb0 /var/log/messages Jul  8 23:00:43 lash kernel: igb0: Watchdog timeout (TX: 0 desc avail: 42 pidx: 1003) -- resetting Jul  8 23:00:43 lash kernel: igb0: link state changed to DOWN Jul  8 23:00:44 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul  9 00:00:01 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul  9 05:01:12 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul  9 05:06:56 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul  9 14:25:33 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting Jul  9 14:44:30 lash kernel: igb0: Watchdog timeout (TX: 7 desc avail: 1024 pidx: 0) -- resetting igb0@pci0:1:0:0:    class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00     vendor     = 'Intel Corporation'     device     = 'I350 Gigabit Network Connection'     class      = network     subclass   = ethernet     cap 01[40] = powerspec 3  supports D0 D3  current D0     cap 05[50] = MSI supports 1 message, 64 bit, vector masks     cap 11[70] = MSI-X supports 10 messages, enabled                  Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]     cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR NS                  link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)     ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected     ecap 0003[140] = Serial 1 ac1f6bffff00eab2     ecap 000e[150] = ARI 1     ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled                      0 VFs configured out of 8 supported                      First VF RID Offset 0x0180, VF RID Stride 0x0004                      VF Device ID 0x1520                      Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304     ecap 0017[1a0] = TPH Requester 1     ecap 0018[1c0] = LTR 1     ecap 000d[1d0] = ACS 1 # dmidecode -t baseboard # dmidecode 3.2 Scanning /dev/mem for entry point. SMBIOS 3.0 present. Handle 0x0002, DMI type 2, 15 bytes Base Board Information     Manufacturer: Supermicro     Product Name: X10DRW-i     Version: 1.02     Serial Number: NM173S002991     Asset Tag: Default string     Features:         Board is a hosting board         Board is replaceable     Location In Chassis: Default string     Chassis Handle: 0x0003     Type: Motherboard     Contained Object Handles: 0 Handle 0x0021, DMI type 41, 11 bytes Onboard Device     Reference Designation: ASPEED Video AST2400     Type: Video     Status: Enabled     Type Instance: 1     Bus Address: 0000:05:00.0 Handle 0x0022, DMI type 41, 11 bytes Onboard Device     Reference Designation: Intel Ethernet i350 #1     Type: Ethernet     Status: Enabled     Type Instance: 1     Bus Address: 0000:01:00.0 Handle 0x0023, DMI type 41, 11 bytes Onboard Device     Reference Designation: Intel Ethernet i350 #2     Type: Ethernet     Status: Enabled     Type Instance: 2     Bus Address: 0000:01:00.1