Date: Thu, 20 Feb 2014 10:27:01 -0600 From: Dan Nelson <dnelson@allantgroup.com> To: Brett Glass <brett@lariat.net> Cc: questions@freebsd.org Subject: Re: Instability in re driver? Message-ID: <20140220162701.GD80443@dan.emsphone.com> In-Reply-To: <201402200339.UAA09587@mail.lariat.net> References: <201402200339.UAA09587@mail.lariat.net>
next in thread | previous in thread | raw e-mail | index | archive | help
In the last episode (Feb 19), Brett Glass said: > We've been experiencing occasional crashes on heavily loaded machines that > have Realtek gigabit Ethernet ports built into their motherboards. This > evening, at a time of peak usage, one of the machines showed the log > messages > > Feb 19 18:44:14 <kern.crit> server kernel: re0: watchdog timeout > Feb 19 18:44:14 <kern.notice> server kernel: re0: link state changed to DOWN > Feb 19 18:44:18 <kern.notice> server kernel: re0: link state changed to UP > > even though we did not take the port down or up. A few minutes later, the > entire machine locked up solid and required a power cycle. > > Here are the boot time messages from the same server (as I rebooted it > following the crash): > > Feb 19 19:08:22 <kern.crit> server kernel: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F PCIe Gigabit Ethernet> port 0xd800-0xd8ff mem 0xfeadf000-0xfeadffff,0xfdefc000-0xfdefffff irq 16 at device 0.0 on pci1 > Feb 19 19:08:22 <kern.crit> server kernel: re0: Using 1 MSI-X message > Feb 19 19:08:22 <kern.crit> server kernel: re0: Chip rev. 0x28000000 > Feb 19 19:08:22 <kern.crit> server kernel: re0: MAC rev. 0x00000000 > Feb 19 19:08:22 <kern.crit> server kernel: miibus0: <MII bus> on re0 > Feb 19 19:08:22 <kern.crit> server kernel: rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0 > Feb 19 19:08:22 <kern.crit> server kernel: rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow > > This problem is occurring on several servers built with the same > motherboard/chipset. Are there known bugs, hangs, or memory leaks that > might have led to this condition? Are there recent fixes which address > it? (The machines that are crashing are all running the latest security > patch level of FreeBSD 9.1-RELEASE.) I have had a similar persistent problem with a re0 interface in one of my machines, where medium UDP traffic sometimes causes a problem with the clock timer. Packets stop flowing, calls to nanosleep() never return, etc. I have watchdogd enabled, so the system ends up resetting on its own. Without that, it would just sit in a half-frozen state indefinitely. The machine is an old Dell Studio 540s, and I've seen the problem on FreeBSD 7, 8 and 9, i386 and amd64. I assumed it was just a problem with my motherboard since nobody else has ever reported a similar problem, and it only resets once or twice a month, so it doesn't really affect me. Neither disabling MSI nor using a different timecounter seems to help. re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff,0xfdff0000-0xfdffffff irq 17 at device 0.0 on pci3 re0: MSI count : 1 re0: MSI-X count : 2 re0: attempting to allocate 1 MSI-X vectors (2 supported) msi: routing MSI-X IRQ 259 to local APIC 0 vector 58 re0: using IRQ 259 for MSI-X re0: Using 1 MSI-X message re0: Chip rev. 0x3c000000 re0: MAC rev. 0x00400000 miibus0: <MII bus> on re0 rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0 rgephy0: OUI 0x00e04c, model 0x0011, rev. 2 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow -- Dan Nelson dnelson@allantgroup.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140220162701.GD80443>