Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Oct 2014 10:46:30 +0900
From:      Yonghyeon PYUN <pyunyh@gmail.com>
To:        Mason Loring Bliss <mason@blisses.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Very bad Realtek problems
Message-ID:  <20141029014630.GA2503@michelle.fasterthan.com>
In-Reply-To: <20141028034445.GR17150@blisses.org>
References:  <20141027195124.GI17150@blisses.org> <20141028015020.GB1054@michelle.fasterthan.com> <20141028034445.GR17150@blisses.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--ZGiS0Q5IWpPtfppv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Oct 27, 2014 at 11:44:45PM -0400, Mason Loring Bliss wrote:
> On Tue, Oct 28, 2014 at 10:50:20AM +0900, Yonghyeon PYUN wrote:
> 
> > Currently re(4) heavily relies on power on default settings since no
> > detailed register configuration is not available. Some register
> > configurations made in Windows can survive from warm boot.
> 
> Alright, a cold boot doesn't help. I froze up on an rsync, and observed these
> stats:
> 
> re0 statistics:
> Tx frames : 209681
> Rx frames : 27559
> Tx errors : 0
> Rx errors : 0
> Rx missed frames : 0
> Rx frame alignment errs : 0
> Tx single collisions : 0
> Tx multiple collisions : 0
> Rx unicast frames : 27548
> Rx broadcast frames : 9
> Rx multicast frames : 2
> Tx aborts : 0
> Tx underruns : 0
> 
> I rebooted with MSI and MSI-X disabled, and it broke again on an rsync. I
> observed:
> 
> re0 statistics:
> Tx frames : 416065
> Rx frames : 47783
> Tx errors : 0
> Rx errors : 0
> Rx missed frames : 0
> Rx frame alignment errs : 0
> Tx single collisions : 0
> Tx multiple collisions : 0
> Rx unicast frames : 47757
> Rx broadcast frames : 24
> Rx multicast frames : 2
> Tx aborts : 0
> Tx underruns : 0
> 
> The multicast frames seem to coincide with interface lock-ups.
> 
> This time to correct I said ifconfig re0 down; ifconfig re0 up. It came back
> but the rsync died.

I guess you don't see 'watchdog timeout' errors so driver's
watchdog handler didn't help.   Given that you can reliably
reproduce the issue, let's check simple ones first. Disable all
H/W offloading features(TX/RX checksum offloading, TSO, VLAN H/W
tag insertion/stripping) and see whether that makes any difference.
If that has no difference, identify which part of MAC is in stuck
condition. Before interface down/up again after rsync breakage, run
tcpdump on your box and see whether you can still see RX packets.
If you can see RX packets, it indicates RX MAC still works.  After
that, run ping(8) to other host and see whether you can see the ICMP
echo request packets sent from your host.  If you can see the ICMP
echo request packets, it indicates TX MAC works.

> 
> I'll be happy to provide further debugging information once I know how to
> collect it.
> 

If you think the issue intermittently happens regardless of network
load, try attached patch.  I'm not sure whether the patch makes any
difference for you since many PCIe NICs don't implement CLKREQ
feature.  It's just a wild guess.

Thanks.

--ZGiS0Q5IWpPtfppv
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="re.aspm.diff"

Index: sys/dev/re/if_re.c
===================================================================
--- sys/dev/re/if_re.c	(revision 273756)
+++ sys/dev/re/if_re.c	(working copy)
@@ -1365,6 +1365,7 @@ re_attach(device_t dev)
 			    PCIER_LINK_CTL, 2);
 			if ((ctl & PCIEM_LINK_CTL_ASPMC) != 0) {
 				ctl &= ~PCIEM_LINK_CTL_ASPMC;
+				ctl &= ~PCIEM_LINK_CTL_ECPM;
 				pci_write_config(dev, sc->rl_expcap +
 				    PCIER_LINK_CTL, ctl, 2);
 				device_printf(dev, "ASPM disabled\n");

--ZGiS0Q5IWpPtfppv--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141029014630.GA2503>