Date: Tue, 25 Oct 2016 07:03:38 +0200 From: "Hartmann, O." <ohartman@zedat.fu-berlin.de> To: YongHyeon PYUN <pyunyh@gmail.com> Cc: FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: CURRENT: re(4) crashing system Message-ID: <20161025070338.76ad6711@hermann> In-Reply-To: <20161025020538.GA1238@michelle.fasterthan.co.kr> References: <20161023132538.6bf55fb2@hermann> <20161024051359.GA1185@michelle.fasterthan.co.kr> <20161024140337.47af924e@freyja.zeit4.iv.bundesimmobilien.de> <20161025020538.GA1238@michelle.fasterthan.co.kr>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 25 Oct 2016 11:05:38 +0900 YongHyeon PYUN <pyunyh@gmail.com> wrote: > On Mon, Oct 24, 2016 at 02:03:37PM +0200, O. Hartmann wrote: > > On Mon, 24 Oct 2016 14:14:00 +0900 > > YongHyeon PYUN <pyunyh@gmail.com> wrote: > > > > > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote: > > > > I tried to report earlier here that CURRENT does have some > > > > serious problems right now and one of those problems seems to > > > > be triggered by the recent re(4) driver. The problem is also > > > > present in recen 11-STABLE! > > > > > > > > Below, you'll find pciconf-output reagrding the device on a > > > > Lenovo E540 Laptop I can test on and trigger the problem. > > > > > > > > The phenomenon is that this NIC does not negotiate 1000baseTX, > > > > it is always falling back to 100baseTX although the device > > > > claims to be a 1 GBit capable device. > > > > > > > > When I try to put the device manually into 1000basTX mode via > > > > > > > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) > > > > driver) > > > > > > > > it is possible to crash the system. The system also crashes when > > > > plugging/unplugging the LAN cord - I guess the renegotiation is > > > > triggering this crash immediately. > > > > > > > > I tried with several switches and routers capable of 1 GBit and > > > > it seems to be independent from the network hardware in use. > > > > > > > > I tried to capture a backtrace when the kernel crashes, but I > > > > do not know how to save the the kernel debugger output. > > > > Although I configured according the handbook debugging, there > > > > is no coredump at all. > > > > > > > > Advice is appreciated - if anybody is interesetd in solving > > > > this. > > > > > > There were several instability reports on re(4). I vaguely guess > > > it would be related with some missing initializations for certain > > > controllers. Unfortunately, there is no publicly available > > > datasheet for those controllers and it's not likely to get access > > > to it in near future. It seems vendor's FreeBSD driver accesses > > > lots of magic registers as well as loading DSP fixups. I have no > > > idea what it wants to do and re(4) used to heavily rely on > > > power-on default register values. Engineering samples I have do > > > not show instabilities so it wouldn't be easy to identify the > > > issue. > > > > > > Probably the first step to address the issue would be identifying > > > those chips and narrowing down the scope of guessing. Would you > > > show me the dmesg output(re(4) and regphy(4) only)? pciconf(8) > > > output is useless here since RealTek uses the same PCI id for > > > PCIe variants. > > > > > > BTW, I was told that the vendor's FreeBSD driver seems to work > > > fine for normal usage pattern. The vendor's driver triggered an > > > instant panic and lacked H/W offloading features in the past. It > > > might have changed though. > > > > The problemacy with re(4) drivers arose again, when I bought some > > "green" equipment, mainly switches, which reduces power emission on > > short cables or non-connected ports. This brought down some servers > > with re(4) chipsets immediately and I had no clue what happend. I > > do not know whether this is a > > I'm not sure but it's likely the issue is related with EEE/Green > Ethernet handling. EEE is negotiated feature with link partner. If > you directly connect your laptop to non-EEE capable link partner > like other re(4) box without switches you may be able to tell > whether the issue is EEE/Green Ethernet related one or not. Me either since when I discovered a problem the first time with CURRENT, that was the Friday before last week's Friday, there was a unlucky coicidence: I got the new switch, FreeBSD introduced a serious bug and I changed the NICs. The laptop, the last in the row of re(4) equipted systems on which I use the Realtek NIC, does well now with Green IT technology, but crashes on plugging/unplugging - not on each event, but at least in one of ten. I guess the Green IT issue is more a unlucky guess of mine and went hand in hand with the problem I face with CURRENT right now on some older, Non UEFI machines. > > > single fate so to speak, or this problem will arise for others, > > too. We exchanged on serving hardware all Realtek NICs with those > > from Intel, and luckily some server mainboards already have Intel > > PHY or NICs. The Broadcom devices we have on some older Fujitus > > hardware is also stable like a charme, even with the new power > > saving switches. > > bge(4) also lacks EEE support(Publicly available datasheet is too > sanitized one). bge(4) firmware probably does not announce EEE > capability by default in link establishment while recent re(4) > devices seem to unconditionally announce EEE. Generally EEE > handling requires a kind of handshake for link state change from > MAC/PHY. > > > While we can swap on server or workstation platforms the NIC, it is > > almost impossible on laptops and the number of laptops with realtek > > chips seems to grow. It is a pity that the venodr of the chipsets > > reject supporting other OSes than Windows - or in some rare cases > > only Linux. After you wrote the answer, I checked on the net who's > > suiatble drivers and the situation seems bad for almost all OSes > > apart from commercial ones like Windooze and Apple OS X. > > > > As soon as I get hands on the laptop again, I'll send the requested > > informations. I know that I played around with re(4) and rgephy(4) > > in the kernel, the rgephy(4) showed up on the dmesg, but I didn't > > see any effect - except that it offered some additional "media > > xxx-options-xxx" mostly appended with "flow" - but rying brought > > also down the system as pluggin or unplugging. > > rgephy(4) will show recognized PHY H/W model. Another information > I'd like to know is OUI information of the PHY. The OUI > information could be get with `devinfo -rv | grep rgephy`. > > The "flow" output of media indicates it negotiated ethernet > flow-control with link partner. rgephy(4) used to announce > autonegotiation even when manual setting is requested with > ifconfig. It was to workaround HW issues seen in the past. > You can disable the use of autonegotiation in manual media > selection with flag0 option. See rgephy(4) for more information. > Not sure whether that option helps though. > > > The last kernel I compiled was then without rgephy(4) - the NIC > > worked as expected, but pluggin/unplugging or having some > > power-down activities on a Netgear SoHo green-pwer switch brings > > the system down as usual. > > If you use re(4) without rgephy(4) it will use ukphy(4) which is > completely dumb on link state detection of re(4) controller. Link > state detection requires non-PHY register access on re(4) so using > ukphy(4) is not recommended. As requested the informations about re0 and rgephy0 on the laptop (Lenovo E540) [...] rgephy0: <RTL8251 1000BASE-T media interface> PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0x3000-0x30ff mem 0xf0d04000-0xf0d04fff,0xf0d00000-0xf0d03fff at device 0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip rev. 0x50800000 re0: MAC rev. 0x00100000 miibus0: <MII bus> on re0 re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: 28:d2:44:79:87:32 re0: netmap queues/slots: TX 1/256, RX 1/256 re0: link state changed to DOWN re0: link state changed to UP [...] I use options netmap in kernel config, but the problem is also present without this option - just for the record. Kind regards, oh
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161025070338.76ad6711>