Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 8 Oct 2016 12:59:33 +0200
From:      "O. Hartmann" <ohartman@zedat.fu-berlin.de>
To:        FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   CURRENT/11-STABLE: Realtek NICs crash FreeBSD
Message-ID:  <20161008125933.0a833469.ohartman@zedat.fu-berlin.de>

next in thread | raw e-mail | index | archive | help
--Sig_/mBPDcwgQw_FJjOMMqQrQ6J/
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable


I will start with a short problem description for the impatient,
afterwards I'll describe the situation in more details.

Running 11-CURRENT, 11-STABLE and now 12-CURRENT on hosts equipted with
Realtek NIC chipsets bring the system down (crash) on a reproduciable
manner. Plugging and unplugging the network cable is one method, having
a more sophisticated switch with green power management does the same,
but in an unpredictable way.

Having the hosts attached directly to a "smart" switch, the crashes can
be reproduced by plugging and unplugging the cord or having some
traffic - then, it seems to me from an observers point of view, the
switch does some arbitrary stuff like link up / link down or power
saving or something I can't check and the systems are going down anyway.
Having a dumb switch as intermediate device, like the Netgeas GS105, a
5 port GBit switch, the connection is stable as long the cabling is
untouched.

The problems occur also on my private Netgear GS110TPv2 8-port GBit "smart =
maneged"
switch, also in "unsmart" mode (means: Eco mode off, no sophiticated stuff =
enabled, no
powersaving/short cabling enabled, no snmp traps and so on, just factory se=
ttings)

Switching on some Eco mode facilities (powersavings,short cable etc.) bring=
s the hosts in
question down really rapidly even with cabling untouched.

The NICs in question are:

1) Server/Host
from dmesg:
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on
miibus0 rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow,
100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master,
1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow

from pciconf -lvceb

re0@pci0:5:0:0: class=3D0x020000 card=3D0x81681849 chip=3D0x816810ec rev=3D=
0x06
hdr=3D0x00 vendor     =3D 'Realtek Semiconductor Co., Ltd.'
    device     =3D 'RTL8111/8168/8411 PCI Express Gigabit Ethernet
Controller' class      =3D network
    subclass   =3D ethernet
    bar   [10] =3D type I/O Port, range 32, base 0xd000, size 256, enabled
    bar   [18] =3D type Prefetchable Memory, range 64, base 0xf2104000,
size 4096, enabled bar   [20] =3D type Prefetchable Memory, range 64,
base 0xf2100000, size 16384, enabled cap 01[40] =3D powerspec 3  supports
D0 D1 D2 D3  current D0 cap 05[50] =3D MSI supports 1 message, 64 bit=20
    cap 10[70] =3D PCI-Express 2 endpoint MSI 1 max data 128(128)
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    cap 11[b0] =3D MSI-X supports 4 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x800]
    cap 03[d0] =3D VPD
    ecap 0001[100] =3D AER 1 0 fatal 0 non-fatal 0 corrected
    ecap 0002[140] =3D VC 1 max VC0
    ecap 0003[160] =3D Serial 1 01000000684ce000

2)
The second system, A Lenovo Laptop E540, has=20

re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port
0x3000-0x30ff mem 0xf0d04000-0xf0d04fff,0xf0d00000-0xf0d03fff at device
0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled
re0: Chip rev. 0x50800000
re0: MAC rev. 0x00100000
miibus0: <MII bus> on re0
rgephy0: <RTL8251 1000BASE-T media interface> PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master,
1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0:
Using defaults for TSO: 65518/35/2048 re0: Ethernet address:
28:d2:44:79:87:32 re0: netmap queues/slots: TX 1/256, RX 1/256

and from pciconf:

re0@pci0:3:0:0: class=3D0x020000 card=3D0x502817aa chip=3D0x816810ec rev=3D=
0x10
hdr=3D0x00 vendor     =3D 'Realtek Semiconductor Co., Ltd.'
    device     =3D 'RTL8111/8168/8411 PCI Express Gigabit Ethernet
Controller' class      =3D network
    subclass   =3D ethernet
    cap 01[40] =3D powerspec 3  supports D0 D1 D2 D3  current D0
    cap 05[50] =3D MSI supports 1 message, 64 bit=20
    cap 10[70] =3D PCI-Express 2 endpoint MSI 1 max data 128(128) RO
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    cap 11[b0] =3D MSI-X supports 4 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x800]
    cap 03[d0] =3D VPD
    ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0002[140] =3D VC 1 max VC0
    ecap 0003[160] =3D Serial 1 01000000684ce000
    ecap 0018[170] =3D LTR 1
    ecap 001e[178] =3D unknown 1



The longer story:

As described above, The problem seems to be with realtek chips I have only.=
 The
Host/server box is also equipted with an Intel NIC and the problem doesn't =
occur. This
specific host also has the same Realtek NIC as the crashing host (it's a cr=
appy ASROCK
board, sorry).

At the campus lab, I realised that on the laptop plugging and unplugging th=
e wired LAN
brought down the system very quickly - that was with 11-CURRENT a couple of=
 months ago,
that was intermediate with 11-STABLE the case and it is now with 12-CURRENT=
 the case
(recent update, all boxes have FreeBSD 12.0-CURRENT #3 r306839: Sat Oct  8 =
11:16:48 CEST
2016). Since this laptop also has an Intel WiFi i7260 device which had seve=
re problems in
the past (iwm driver), I did not pay much attention to the wire net problem=
 - CURRENT
always has some issues, so, I tried not to plug and unplug while the system=
 is running.

Now at the lab/office, and at "home" the network infrastructure has changed=
, CISCO or
HP switches as the backbone infrastructure at the lab (I do not know the of=
fice's
infrastructure) and Netgear GS110TPv2 smart managed switch seems to cause m=
ore trouble as
anticipated. With both hosts attached to the GS110TPv2 and some Eco mode av=
ailable, the
systems go down predictable. Loggin in on the interface of the switch is al=
so a deadly
mission. Leaving the switch in factory settings untouched, pluggin/unpluggi=
ng is also a
deadly force to FreeBSD. Or simply waiting some time - while I do not know =
what the
switch is doing then - the systems crash.=20

At the moment, the systems with Realtek NICs (3) are unusable with this sma=
rt switch and,
as a result of my observation today after the GS110TPv2 got installed, prob=
lems with
other switches as well. I do not think its a problem with the switch, but s=
ome switches
seem to perform actions bringing down FreeBSD on a predictable manner (Eco
mode/powersaving) or even unplugging the cabling.

I start now changing NICs to separate Intel based ones to get rid of this R=
ealtek crap.
So the only debugging capable device left will be the laptop and I'd apprec=
iate some
tipps for giving you some more informations. Right now, I do not have crash=
dumps or
screenshots - the laptop shows some informations, but the vanish fast. I'll=
 configure as
soon as possible a debugging kernel.

Kind regards,
oh

--Sig_/mBPDcwgQw_FJjOMMqQrQ6J/
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJX+NGVAAoJEOgBcD7A/5N8YsMIAJN2yUC4FAfxQtuhIfnvn3VR
jaXvCpMBcB6VSnoyUq2DAbz0WMOXOtZsn03ykk2yt8wqnjxajacKumLsuOXcOtrX
oP8hnnDJahzzmkEugsrZRA1jNFl9Zk/aFuB6DU58n21oaM4Z7fNuaiXdX+j4G3NZ
LhBQtOs9XM0XJAZSHBfL2mcdJPnGhTonzvYgsF+L728VQaEJrDcNLfFml2uFNwgz
+/hHM+aHNBnacpN/jt/aB6IPWKIazFdpZdoYEjCOtmZV7AOvf0CW99C4sxiwIFgN
S+Ds/ig53yeUVo7XrzcKbWu3DgUCN93zETW/2ShnXu1WBbAt3Cty3CuFPZ3xOZ8=
=iHBl
-----END PGP SIGNATURE-----

--Sig_/mBPDcwgQw_FJjOMMqQrQ6J/--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161008125933.0a833469.ohartman>