Date: Thu, 23 Apr 2015 10:21:09 +0200 From: Harald Schmalzbauer <h.schmalzbauer@omnilan.de> To: Jack Vogel <jfvogel@gmail.com> Cc: FreeBSD Stable <freebsd-stable@freebsd.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: igb(4) watchdog timeout, lagg(4) fails Message-ID: <5538AB75.4070401@omnilan.de> In-Reply-To: <54E733FA.1020208@omnilan.de> References: <54ACC6A2.1050400@omnilan.de> <54AE565D.50208@omnilan.de> <54AE5A6B.7040601@omnilan.de> <54AFA784.6020102@omnilan.de> <CAFOYbcn0F1QXajUZ2XOncSg8z9xjuCQtzC=Siteyrq%2BDkvAw-A@mail.gmail.com> <54B10432.8050909@omnilan.de> <54DB8975.2030001@omnilan.de> <CAFOYbcnNYFhesur3gK71J3Mr1W_KWh6%2BhFjjOZAZ-buAc16k9w@mail.gmail.com> <54DBB1F5.1090601@omnilan.de> <54E733FA.1020208@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigE4E7208CF628E4EC4EE9682D Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Bez=FCglich Harald Schmalzbauer's Nachricht vom 20.02.2015 14:17 (localt= ime): (https://lists.freebsd.org/pipermail/freebsd-stable/2015-February/081810.= html) > Bez=FCglich Harald Schmalzbauer's Nachricht vom 11.02.2015 20:48 > (localtime): >> Bez=FCglich Jack Vogel's Nachricht vom 11.02.2015 18:31 (localtime): >>> tdh and tdt mean the head and tail indices of the ring, and these >>> values are >>> obviously severely borked :) Hello Jack, could you find some time for having a look at this problem? The reported values don't bother me, but the watchdog timeout which happens on NICs that are PCIe-connected via the PCH. Please see my previouse findings. I think the most significant hint for my problem seems to be the link_irq, which becomes garbage at the first watchdog timeout occurrence, like previously described: >> =85 >> For the records: Rebooting the machine (ESXi guest-only!) brought the >> stalled igb1 back to operation. >> The guest has 2 igb (kawela) ports, one from a NIC(Intel ET Dual Port >> 82576)@CPU-PCIe and the second port from an identical NIC, but connect= ed >> via PCH-PCIe. >> The watchdog timeout problem only occurs with the port from the >> PCH-PCIe-connected NIC (falisfied)! >> After the reboot the suspicious "dev.igb.1.link_irq=3D848" turned into= : >> dev.igb.0.link_irq: 3 >> dev.igb.1.link_irq: 4 > Jack, > > I'd like to let you know that "dev.igb.1.link_irq" again shows garbage > after the watchdog timeout problem occured again: > dev.igb.1.link_irq: 1458 > > I can imagine that resetting goes wrong and ends in loss of link_irq. > I now have to reboot the guest to get igb1 back to a working state, the= n > the link_irq will show "4" again, but I can't tell you what was first, > the timeour-reset or the "link_irq" jam. I guess the latter can't be th= e > case, but I have no idea about the code Thanks for any help, currently my lagg setup is permanently degraded :-( Would be nice to have it back in a working state :-) -Harry --------------enigE4E7208CF628E4EC4EE9682D Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlU4q4EACgkQLDqVQ9VXb8iWYQCg0nEJGLjm1TSYzXZ4ZQtCG0yh MM4An1k7NyDhS9rSfHuZsndSj+amv+hN =nvl7 -----END PGP SIGNATURE----- --------------enigE4E7208CF628E4EC4EE9682D--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5538AB75.4070401>