From owner-freebsd-questions@FreeBSD.ORG Thu Feb 20 16:59:51 2014 Return-Path: Delivered-To: questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1AD2CB7A for ; Thu, 20 Feb 2014 16:59:51 +0000 (UTC) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BCEBF16DF for ; Thu, 20 Feb 2014 16:59:50 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.8/8.14.8) with ESMTP id s1KGxbEN097285; Thu, 20 Feb 2014 08:59:37 -0800 (PST) (envelope-from freebsd@penx.com) Subject: Re: Instability in re driver? From: Dennis Glatting To: Dan Nelson In-Reply-To: <20140220162701.GD80443@dan.emsphone.com> References: <201402200339.UAA09587@mail.lariat.net> <20140220162701.GD80443@dan.emsphone.com> Content-Type: text/plain; charset="us-ascii" Date: Thu, 20 Feb 2014 08:59:36 -0800 Message-ID: <1392915576.67604.10.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-SoftwareMunitions-MailScanner-Information: Dennis Glatting X-SoftwareMunitions-MailScanner-ID: s1KGxbEN097285 X-SoftwareMunitions-MailScanner: Found to be clean X-MailScanner-From: freebsd@penx.com Cc: Brett Glass , questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Feb 2014 16:59:51 -0000 On Thu, 2014-02-20 at 10:27 -0600, Dan Nelson wrote: > In the last episode (Feb 19), Brett Glass said: > > We've been experiencing occasional crashes on heavily loaded machines that > > have Realtek gigabit Ethernet ports built into their motherboards. This > > evening, at a time of peak usage, one of the machines showed the log > > messages > > > > Feb 19 18:44:14 server kernel: re0: watchdog timeout > > Feb 19 18:44:14 server kernel: re0: link state changed to DOWN > > Feb 19 18:44:18 server kernel: re0: link state changed to UP > > > > even though we did not take the port down or up. A few minutes later, the > > entire machine locked up solid and required a power cycle. > > > > Here are the boot time messages from the same server (as I rebooted it > > following the crash): > > > > Feb 19 19:08:22 server kernel: re0: port 0xd800-0xd8ff mem 0xfeadf000-0xfeadffff,0xfdefc000-0xfdefffff irq 16 at device 0.0 on pci1 > > Feb 19 19:08:22 server kernel: re0: Using 1 MSI-X message > > Feb 19 19:08:22 server kernel: re0: Chip rev. 0x28000000 > > Feb 19 19:08:22 server kernel: re0: MAC rev. 0x00000000 > > Feb 19 19:08:22 server kernel: miibus0: on re0 > > Feb 19 19:08:22 server kernel: rgephy0: PHY 1 on miibus0 > > Feb 19 19:08:22 server kernel: rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow > > > > This problem is occurring on several servers built with the same > > motherboard/chipset. Are there known bugs, hangs, or memory leaks that > > might have led to this condition? Are there recent fixes which address > > it? (The machines that are crashing are all running the latest security > > patch level of FreeBSD 9.1-RELEASE.) > > I have had a similar persistent problem with a re0 interface in one of my > machines, where medium UDP traffic sometimes causes a problem with the clock > timer. Packets stop flowing, calls to nanosleep() never return, etc. I > have watchdogd enabled, so the system ends up resetting on its own. Without > that, it would just sit in a half-frozen state indefinitely. The machine is > an old Dell Studio 540s, and I've seen the problem on FreeBSD 7, 8 and 9, > i386 and amd64. I assumed it was just a problem with my motherboard since > nobody else has ever reported a similar problem, and it only resets once or > twice a month, so it doesn't really affect me. Neither disabling MSI nor > using a different timecounter seems to help. > I have the same problem on a new ASUS A78M-A MB. Transfer is really fast (300+ GB files) then poof. Generally, I stay away from RealTek, replacing my network interfaces with Intel, but on A78M-A I don't much choice. > re0: port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff,0xfdff0000-0xfdffffff irq 17 at device 0.0 on pci3 > re0: MSI count : 1 > re0: MSI-X count : 2 > re0: attempting to allocate 1 MSI-X vectors (2 supported) > msi: routing MSI-X IRQ 259 to local APIC 0 vector 58 > re0: using IRQ 259 for MSI-X > re0: Using 1 MSI-X message > re0: Chip rev. 0x3c000000 > re0: MAC rev. 0x00400000 > miibus0: on re0 > rgephy0: PHY 1 on miibus0 > rgephy0: OUI 0x00e04c, model 0x0011, rev. 2 > rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow > >