Date: Mon, 27 Oct 2014 15:51:24 -0400 From: Mason Loring Bliss <mason@blisses.org> To: freebsd-net@freebsd.org Subject: Very bad Realtek problems Message-ID: <20141027195124.GI17150@blisses.org>
next in thread | raw e-mail | index | archive | help
Hi, all. I've been having sporadic and serious problems with the Realtek gigabit interface built into my motherboard. Periodically, it just freezes up. I've tried several things to no avail: turning on DEVICE_POLLING, frobbing bootloader options and sysctl settings, etc. I had a solid week of function with the following: hw.re.msi_disable="1" hw.re.msix_disable="1" dev.re.0.int_rx_mod=0 <-- this one says it can be a loader tuneable, but it didn't work that way - I had to set it from sysctl.conf And then after a reboot, I locked up again on pushing the interface a little with an rsync. However, I've seen interactive sessions lock the thing up too. It's not just when I'd doing big transfers. It's not clear what's happening. I have been capturing stats periodically with 'sysctl dev.re.0.stats=1', but that doesn't always show a problem. For instance, during one of the lock-ups last night, after a reboot, I got this: re0 statistics: Tx frames : 171306 Rx frames : 20271 Tx errors : 0 Rx errors : 0 Rx missed frames : 0 Rx frame alignment errs : 0 Tx single collisions : 0 Tx multiple collisions : 0 Rx unicast frames : 20271 Rx broadcast frames : 0 Rx multicast frames : 0 Tx aborts : 0 Tx underruns : 0 After running overnight, with sporadic automated transfers: re0 statistics: Tx frames : 4658945 Rx frames : 1258514 Tx errors : 0 Rx errors : 33 Rx missed frames : 0 Rx frame alignment errs : 3591 Tx single collisions : 0 Tx multiple collisions : 0 Rx unicast frames : 1255880 Rx broadcast frames : 2411 Rx multicast frames : 223 Tx aborts : 0 Tx underruns : 0 I was seeing the "Rx multicast frames" creep up each time I saw a freeze last night, which was confusing in that I'm not sure why there'd be any multicast traffic. Here's the card from dmesg, with MSI/X turned off: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe800-0xe8ff mem 0xfbfff000-0xfbffffff,0xfbff8000-0xfbffbfff irq 18 at device 0.0 on pci2 re0: Chip rev. 0x2c000000 re0: MAC rev. 0x00200000 miibus0: <MII bus> on re0 rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: Ethernet address: bc:ae:c5:bd:44:e7 The motherboard with this included: Base Board Information Manufacturer: ASUSTeK Computer INC. Product Name: M4A88T-M Version: Rev X.0x Serial Number: MF70B1G04201588 Asset Tag: To Be Filled By O.E.M. Features: Board is a hosting board Board is replaceable Location In Chassis: To Be Filled By O.E.M. Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0 In general I've been saying "ifconfig re0 down ; ifconfig re0 up" to kick the interface, but last night a friendly person from IRC mentioned that I could work around this by running a steady ping and frobbing mediatype when I see the pings fail. So, I've got this running: while true do ping -c 1 -t 1 firewall > /dev/null 2>&1 if [ $? -ne 0 ]; then date echo "toggling re0" echo ifconfig re0 media 1000baseT mediaopt full-duplex,flowcontrol,master ifconfig re0 media autoselect mediaopt flowcontrol sleep 3 fi sleep 1 done This has been noting failures sporadically throughout the day, but it's allowing traffic to continue moving, albeit with the occasional hiccough. This hardware has been running Debian for a couple years, and it's never had so much as a short hiccough, so I have confidence that the hardware is fine. It suggests that there's something the Linux driver is doing to handle this hardware that FreeBSD isn't doing. For a while I was dual-booting and I'd see errors with FreeBSD running that were't there under Debian. I'd started diving into the source, both Linux and FreeBSD, but I lack sufficient exposure to ethernet driver code to be able to get a high-level picture of what they're doing, and as such I haven't yet noticed any special- case or hardware glitch handling that we're missing, although I might find something eventually. I'm struggling with finding a way to see what's actually happening with this. I've toggled MSI and MSI-X handling, I've turned down interrupt handling delays, I've tried both I/O and memory register transfers, although I'd not actually clear what's happening differently there. I've had polling variously enabled and disabled. One thing to note is that last night's horror while I was trying to move some back-up data was after rebooting from Windows. (Installed on a partition for gaming...) It made me wonder if we're not fully setting up some state on the card. I'd have what felt like a solid, glitchless week before that. FWIW, I'm running 10.1-RC3 on this box and I've seen issues from early on while I was still running 10.0-RELEASE. Thanks in advance for clues. This is a showstopper for futher deployment for me, as I've got these Realtek on-board cards in several boxes, and while the media frobbing largely works, it's not something I can inflict on my users. -- Mason Loring Bliss (( If I have not seen as far as others, it is because mason@blisses.org )) giants were standing on my shoulders. - Hal Abelson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141027195124.GI17150>