Date: Wed, 29 Oct 2014 10:46:30 +0900 From: Yonghyeon PYUN <pyunyh@gmail.com> To: Mason Loring Bliss <mason@blisses.org> Cc: freebsd-net@freebsd.org Subject: Re: Very bad Realtek problems Message-ID: <20141029014630.GA2503@michelle.fasterthan.com> In-Reply-To: <20141028034445.GR17150@blisses.org> References: <20141027195124.GI17150@blisses.org> <20141028015020.GB1054@michelle.fasterthan.com> <20141028034445.GR17150@blisses.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--ZGiS0Q5IWpPtfppv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Oct 27, 2014 at 11:44:45PM -0400, Mason Loring Bliss wrote: > On Tue, Oct 28, 2014 at 10:50:20AM +0900, Yonghyeon PYUN wrote: > > > Currently re(4) heavily relies on power on default settings since no > > detailed register configuration is not available. Some register > > configurations made in Windows can survive from warm boot. > > Alright, a cold boot doesn't help. I froze up on an rsync, and observed these > stats: > > re0 statistics: > Tx frames : 209681 > Rx frames : 27559 > Tx errors : 0 > Rx errors : 0 > Rx missed frames : 0 > Rx frame alignment errs : 0 > Tx single collisions : 0 > Tx multiple collisions : 0 > Rx unicast frames : 27548 > Rx broadcast frames : 9 > Rx multicast frames : 2 > Tx aborts : 0 > Tx underruns : 0 > > I rebooted with MSI and MSI-X disabled, and it broke again on an rsync. I > observed: > > re0 statistics: > Tx frames : 416065 > Rx frames : 47783 > Tx errors : 0 > Rx errors : 0 > Rx missed frames : 0 > Rx frame alignment errs : 0 > Tx single collisions : 0 > Tx multiple collisions : 0 > Rx unicast frames : 47757 > Rx broadcast frames : 24 > Rx multicast frames : 2 > Tx aborts : 0 > Tx underruns : 0 > > The multicast frames seem to coincide with interface lock-ups. > > This time to correct I said ifconfig re0 down; ifconfig re0 up. It came back > but the rsync died. I guess you don't see 'watchdog timeout' errors so driver's watchdog handler didn't help. Given that you can reliably reproduce the issue, let's check simple ones first. Disable all H/W offloading features(TX/RX checksum offloading, TSO, VLAN H/W tag insertion/stripping) and see whether that makes any difference. If that has no difference, identify which part of MAC is in stuck condition. Before interface down/up again after rsync breakage, run tcpdump on your box and see whether you can still see RX packets. If you can see RX packets, it indicates RX MAC still works. After that, run ping(8) to other host and see whether you can see the ICMP echo request packets sent from your host. If you can see the ICMP echo request packets, it indicates TX MAC works. > > I'll be happy to provide further debugging information once I know how to > collect it. > If you think the issue intermittently happens regardless of network load, try attached patch. I'm not sure whether the patch makes any difference for you since many PCIe NICs don't implement CLKREQ feature. It's just a wild guess. Thanks. --ZGiS0Q5IWpPtfppv Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="re.aspm.diff" Index: sys/dev/re/if_re.c =================================================================== --- sys/dev/re/if_re.c (revision 273756) +++ sys/dev/re/if_re.c (working copy) @@ -1365,6 +1365,7 @@ re_attach(device_t dev) PCIER_LINK_CTL, 2); if ((ctl & PCIEM_LINK_CTL_ASPMC) != 0) { ctl &= ~PCIEM_LINK_CTL_ASPMC; + ctl &= ~PCIEM_LINK_CTL_ECPM; pci_write_config(dev, sc->rl_expcap + PCIER_LINK_CTL, ctl, 2); device_printf(dev, "ASPM disabled\n"); --ZGiS0Q5IWpPtfppv--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141029014630.GA2503>