From owner-freebsd-current@freebsd.org Sat Oct 8 10:59:42 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EEDCFC05810 for ; Sat, 8 Oct 2016 10:59:42 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B194AC40 for ; Sat, 8 Oct 2016 10:59:42 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.85) for freebsd-current@freebsd.org with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (envelope-from ) id <1bspLv-000r0N-8v>; Sat, 08 Oct 2016 12:59:39 +0200 Received: from x5ce12709.dyn.telefonica.de ([92.225.39.9] helo=thor.walstatt.dynvpn.de) by inpost2.zedat.fu-berlin.de (Exim 4.85) for freebsd-current@freebsd.org with esmtpsa (TLSv1.2:AES256-GCM-SHA384:256) (envelope-from ) id <1bspLu-003yPd-Tm>; Sat, 08 Oct 2016 12:59:39 +0200 Date: Sat, 8 Oct 2016 12:59:33 +0200 From: "O. Hartmann" To: FreeBSD CURRENT Subject: CURRENT/11-STABLE: Realtek NICs crash FreeBSD Message-ID: <20161008125933.0a833469.ohartman@zedat.fu-berlin.de> Organization: FU Berlin X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.29; amd64-portbld-freebsd12.0) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; boundary="Sig_/mBPDcwgQw_FJjOMMqQrQ6J/"; protocol="application/pgp-signature" X-Originating-IP: 92.225.39.9 X-ZEDAT-Hint: A X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Oct 2016 10:59:43 -0000 --Sig_/mBPDcwgQw_FJjOMMqQrQ6J/ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable I will start with a short problem description for the impatient, afterwards I'll describe the situation in more details. Running 11-CURRENT, 11-STABLE and now 12-CURRENT on hosts equipted with Realtek NIC chipsets bring the system down (crash) on a reproduciable manner. Plugging and unplugging the network cable is one method, having a more sophisticated switch with green power management does the same, but in an unpredictable way. Having the hosts attached directly to a "smart" switch, the crashes can be reproduced by plugging and unplugging the cord or having some traffic - then, it seems to me from an observers point of view, the switch does some arbitrary stuff like link up / link down or power saving or something I can't check and the systems are going down anyway. Having a dumb switch as intermediate device, like the Netgeas GS105, a 5 port GBit switch, the connection is stable as long the cabling is untouched. The problems occur also on my private Netgear GS110TPv2 8-port GBit "smart = maneged" switch, also in "unsmart" mode (means: Eco mode off, no sophiticated stuff = enabled, no powersaving/short cabling enabled, no snmp traps and so on, just factory se= ttings) Switching on some Eco mode facilities (powersavings,short cable etc.) bring= s the hosts in question down really rapidly even with cabling untouched. The NICs in question are: 1) Server/Host from dmesg: rgephy0: PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow from pciconf -lvceb re0@pci0:5:0:0: class=3D0x020000 card=3D0x81681849 chip=3D0x816810ec rev=3D= 0x06 hdr=3D0x00 vendor =3D 'Realtek Semiconductor Co., Ltd.' device =3D 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class =3D network subclass =3D ethernet bar [10] =3D type I/O Port, range 32, base 0xd000, size 256, enabled bar [18] =3D type Prefetchable Memory, range 64, base 0xf2104000, size 4096, enabled bar [20] =3D type Prefetchable Memory, range 64, base 0xf2100000, size 16384, enabled cap 01[40] =3D powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] =3D MSI supports 1 message, 64 bit=20 cap 10[70] =3D PCI-Express 2 endpoint MSI 1 max data 128(128) link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[b0] =3D MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] cap 03[d0] =3D VPD ecap 0001[100] =3D AER 1 0 fatal 0 non-fatal 0 corrected ecap 0002[140] =3D VC 1 max VC0 ecap 0003[160] =3D Serial 1 01000000684ce000 2) The second system, A Lenovo Laptop E540, has=20 re0: port 0x3000-0x30ff mem 0xf0d04000-0xf0d04fff,0xf0d00000-0xf0d03fff at device 0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip rev. 0x50800000 re0: MAC rev. 0x00100000 miibus0: on re0 rgephy0: PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: 28:d2:44:79:87:32 re0: netmap queues/slots: TX 1/256, RX 1/256 and from pciconf: re0@pci0:3:0:0: class=3D0x020000 card=3D0x502817aa chip=3D0x816810ec rev=3D= 0x10 hdr=3D0x00 vendor =3D 'Realtek Semiconductor Co., Ltd.' device =3D 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class =3D network subclass =3D ethernet cap 01[40] =3D powerspec 3 supports D0 D1 D2 D3 current D0 cap 05[50] =3D MSI supports 1 message, 64 bit=20 cap 10[70] =3D PCI-Express 2 endpoint MSI 1 max data 128(128) RO link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[b0] =3D MSI-X supports 4 messages, enabled Table in map 0x20[0x0], PBA in map 0x20[0x800] cap 03[d0] =3D VPD ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 0 corrected ecap 0002[140] =3D VC 1 max VC0 ecap 0003[160] =3D Serial 1 01000000684ce000 ecap 0018[170] =3D LTR 1 ecap 001e[178] =3D unknown 1 The longer story: As described above, The problem seems to be with realtek chips I have only.= The Host/server box is also equipted with an Intel NIC and the problem doesn't = occur. This specific host also has the same Realtek NIC as the crashing host (it's a cr= appy ASROCK board, sorry). At the campus lab, I realised that on the laptop plugging and unplugging th= e wired LAN brought down the system very quickly - that was with 11-CURRENT a couple of= months ago, that was intermediate with 11-STABLE the case and it is now with 12-CURRENT= the case (recent update, all boxes have FreeBSD 12.0-CURRENT #3 r306839: Sat Oct 8 = 11:16:48 CEST 2016). Since this laptop also has an Intel WiFi i7260 device which had seve= re problems in the past (iwm driver), I did not pay much attention to the wire net problem= - CURRENT always has some issues, so, I tried not to plug and unplug while the system= is running. Now at the lab/office, and at "home" the network infrastructure has changed= , CISCO or HP switches as the backbone infrastructure at the lab (I do not know the of= fice's infrastructure) and Netgear GS110TPv2 smart managed switch seems to cause m= ore trouble as anticipated. With both hosts attached to the GS110TPv2 and some Eco mode av= ailable, the systems go down predictable. Loggin in on the interface of the switch is al= so a deadly mission. Leaving the switch in factory settings untouched, pluggin/unpluggi= ng is also a deadly force to FreeBSD. Or simply waiting some time - while I do not know = what the switch is doing then - the systems crash.=20 At the moment, the systems with Realtek NICs (3) are unusable with this sma= rt switch and, as a result of my observation today after the GS110TPv2 got installed, prob= lems with other switches as well. I do not think its a problem with the switch, but s= ome switches seem to perform actions bringing down FreeBSD on a predictable manner (Eco mode/powersaving) or even unplugging the cabling. I start now changing NICs to separate Intel based ones to get rid of this R= ealtek crap. So the only debugging capable device left will be the laptop and I'd apprec= iate some tipps for giving you some more informations. Right now, I do not have crash= dumps or screenshots - the laptop shows some informations, but the vanish fast. I'll= configure as soon as possible a debugging kernel. Kind regards, oh --Sig_/mBPDcwgQw_FJjOMMqQrQ6J/ Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJX+NGVAAoJEOgBcD7A/5N8YsMIAJN2yUC4FAfxQtuhIfnvn3VR jaXvCpMBcB6VSnoyUq2DAbz0WMOXOtZsn03ykk2yt8wqnjxajacKumLsuOXcOtrX oP8hnnDJahzzmkEugsrZRA1jNFl9Zk/aFuB6DU58n21oaM4Z7fNuaiXdX+j4G3NZ LhBQtOs9XM0XJAZSHBfL2mcdJPnGhTonzvYgsF+L728VQaEJrDcNLfFml2uFNwgz +/hHM+aHNBnacpN/jt/aB6IPWKIazFdpZdoYEjCOtmZV7AOvf0CW99C4sxiwIFgN S+Ds/ig53yeUVo7XrzcKbWu3DgUCN93zETW/2ShnXu1WBbAt3Cty3CuFPZ3xOZ8= =iHBl -----END PGP SIGNATURE----- --Sig_/mBPDcwgQw_FJjOMMqQrQ6J/--