From owner-freebsd-net@FreeBSD.ORG Wed Oct 29 01:46:41 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CB348AAA for ; Wed, 29 Oct 2014 01:46:41 +0000 (UTC) Received: from mail-pa0-x234.google.com (mail-pa0-x234.google.com [IPv6:2607:f8b0:400e:c03::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9C3D981D for ; Wed, 29 Oct 2014 01:46:41 +0000 (UTC) Received: by mail-pa0-f52.google.com with SMTP id fa1so2028041pad.25 for ; Tue, 28 Oct 2014 18:46:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=Ejmi4qULAHlpBRTQVzB3gPMa/RaKA27HxgjeaBBPuzY=; b=XUH+2sTmCCANb3qtYlxooJthHMD5r3uGlbWVg3IL03NJUwx/OGDSDxdg4OsadnCSzK NoddmuZI/Fjk9jZeNJlrHr94ZjUmdUbHp4QwPFVRJi/752+49DOKtZKO7Dr/yAmIslhX wf3YUcjynvnlo+xi3f1Si/La9ppgDh1T7SKNKqB9nudrASWh11i5WWAwJTAASymiLdAi DalqxuPSkD/V2Ni0yAKvWulcOmXr5NbAnRWXLtTth8MPjuhDHoevzBdNKHj/X4QRfELN 0IVN3SqyNqySFgeu275mw7EeIncq+wR/0bPMPY0iPrfCJ9b91DuOkmIPRCf3SFWRsL4b LSNA== X-Received: by 10.70.65.37 with SMTP id u5mr7249691pds.93.1414547201205; Tue, 28 Oct 2014 18:46:41 -0700 (PDT) Received: from pyunyh@gmail.com ([106.247.248.2]) by mx.google.com with ESMTPSA id gm11sm2743471pbd.63.2014.10.28.18.46.37 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 28 Oct 2014 18:46:39 -0700 (PDT) From: Yonghyeon PYUN X-Google-Original-From: "Yonghyeon PYUN" Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Wed, 29 Oct 2014 10:46:30 +0900 Date: Wed, 29 Oct 2014 10:46:30 +0900 To: Mason Loring Bliss Subject: Re: Very bad Realtek problems Message-ID: <20141029014630.GA2503@michelle.fasterthan.com> Reply-To: pyunyh@gmail.com References: <20141027195124.GI17150@blisses.org> <20141028015020.GB1054@michelle.fasterthan.com> <20141028034445.GR17150@blisses.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="ZGiS0Q5IWpPtfppv" Content-Disposition: inline In-Reply-To: <20141028034445.GR17150@blisses.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 01:46:41 -0000 --ZGiS0Q5IWpPtfppv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Oct 27, 2014 at 11:44:45PM -0400, Mason Loring Bliss wrote: > On Tue, Oct 28, 2014 at 10:50:20AM +0900, Yonghyeon PYUN wrote: > > > Currently re(4) heavily relies on power on default settings since no > > detailed register configuration is not available. Some register > > configurations made in Windows can survive from warm boot. > > Alright, a cold boot doesn't help. I froze up on an rsync, and observed these > stats: > > re0 statistics: > Tx frames : 209681 > Rx frames : 27559 > Tx errors : 0 > Rx errors : 0 > Rx missed frames : 0 > Rx frame alignment errs : 0 > Tx single collisions : 0 > Tx multiple collisions : 0 > Rx unicast frames : 27548 > Rx broadcast frames : 9 > Rx multicast frames : 2 > Tx aborts : 0 > Tx underruns : 0 > > I rebooted with MSI and MSI-X disabled, and it broke again on an rsync. I > observed: > > re0 statistics: > Tx frames : 416065 > Rx frames : 47783 > Tx errors : 0 > Rx errors : 0 > Rx missed frames : 0 > Rx frame alignment errs : 0 > Tx single collisions : 0 > Tx multiple collisions : 0 > Rx unicast frames : 47757 > Rx broadcast frames : 24 > Rx multicast frames : 2 > Tx aborts : 0 > Tx underruns : 0 > > The multicast frames seem to coincide with interface lock-ups. > > This time to correct I said ifconfig re0 down; ifconfig re0 up. It came back > but the rsync died. I guess you don't see 'watchdog timeout' errors so driver's watchdog handler didn't help. Given that you can reliably reproduce the issue, let's check simple ones first. Disable all H/W offloading features(TX/RX checksum offloading, TSO, VLAN H/W tag insertion/stripping) and see whether that makes any difference. If that has no difference, identify which part of MAC is in stuck condition. Before interface down/up again after rsync breakage, run tcpdump on your box and see whether you can still see RX packets. If you can see RX packets, it indicates RX MAC still works. After that, run ping(8) to other host and see whether you can see the ICMP echo request packets sent from your host. If you can see the ICMP echo request packets, it indicates TX MAC works. > > I'll be happy to provide further debugging information once I know how to > collect it. > If you think the issue intermittently happens regardless of network load, try attached patch. I'm not sure whether the patch makes any difference for you since many PCIe NICs don't implement CLKREQ feature. It's just a wild guess. Thanks. --ZGiS0Q5IWpPtfppv Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="re.aspm.diff" Index: sys/dev/re/if_re.c =================================================================== --- sys/dev/re/if_re.c (revision 273756) +++ sys/dev/re/if_re.c (working copy) @@ -1365,6 +1365,7 @@ re_attach(device_t dev) PCIER_LINK_CTL, 2); if ((ctl & PCIEM_LINK_CTL_ASPMC) != 0) { ctl &= ~PCIEM_LINK_CTL_ASPMC; + ctl &= ~PCIEM_LINK_CTL_ECPM; pci_write_config(dev, sc->rl_expcap + PCIER_LINK_CTL, ctl, 2); device_printf(dev, "ASPM disabled\n"); --ZGiS0Q5IWpPtfppv--