Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Apr 2015 10:21:09 +0200
From:      Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
To:        Jack Vogel <jfvogel@gmail.com>
Cc:        FreeBSD Stable <freebsd-stable@freebsd.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: igb(4) watchdog timeout, lagg(4) fails
Message-ID:  <5538AB75.4070401@omnilan.de>
In-Reply-To: <54E733FA.1020208@omnilan.de>
References:  <54ACC6A2.1050400@omnilan.de>	<54AE565D.50208@omnilan.de>	<54AE5A6B.7040601@omnilan.de>	<54AFA784.6020102@omnilan.de>	<CAFOYbcn0F1QXajUZ2XOncSg8z9xjuCQtzC=Siteyrq%2BDkvAw-A@mail.gmail.com>	<54B10432.8050909@omnilan.de>	<54DB8975.2030001@omnilan.de> <CAFOYbcnNYFhesur3gK71J3Mr1W_KWh6%2BhFjjOZAZ-buAc16k9w@mail.gmail.com> <54DBB1F5.1090601@omnilan.de> <54E733FA.1020208@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigE4E7208CF628E4EC4EE9682D
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

 Bez=FCglich Harald Schmalzbauer's Nachricht vom 20.02.2015 14:17 (localt=
ime):
(https://lists.freebsd.org/pipermail/freebsd-stable/2015-February/081810.=
html)
>  Bez=FCglich Harald Schmalzbauer's Nachricht vom 11.02.2015 20:48
> (localtime):
>>  Bez=FCglich Jack Vogel's Nachricht vom 11.02.2015 18:31 (localtime):
>>> tdh and tdt mean the head and tail indices of the ring, and these
>>> values are
>>> obviously severely borked :)

Hello Jack,

could you find some time for having a look at this problem? The reported
values don't bother me, but the watchdog timeout which happens on NICs
that are PCIe-connected via the PCH. Please see my previouse findings. I
think the most significant hint for my problem seems to be the link_irq,
which becomes garbage at the first watchdog timeout occurrence, like
previously described:

>> =85
>> For the records: Rebooting the machine (ESXi guest-only!) brought the
>> stalled igb1 back to operation.
>> The guest has 2 igb (kawela) ports, one from a NIC(Intel ET Dual Port
>> 82576)@CPU-PCIe and the second port from an identical NIC, but connect=
ed
>> via PCH-PCIe.
>> The watchdog timeout problem only occurs with the port from the
>> PCH-PCIe-connected NIC (falisfied)!
>> After the reboot the suspicious "dev.igb.1.link_irq=3D848" turned into=
:
>> dev.igb.0.link_irq: 3
>> dev.igb.1.link_irq: 4
> Jack,
>
> I'd like to let you know that "dev.igb.1.link_irq" again shows garbage
> after the watchdog timeout problem occured again:
> dev.igb.1.link_irq: 1458
>
> I can imagine that resetting goes wrong and ends in loss of link_irq.
> I now have to reboot the guest to get igb1 back to a working state, the=
n
> the link_irq will show "4" again, but I can't tell you what was first,
> the timeour-reset or the "link_irq" jam. I guess the latter can't be th=
e
> case, but I have no idea about the code


Thanks for any help, currently my lagg setup is permanently degraded :-(
Would be nice to have it back in a working state :-)

-Harry


--------------enigE4E7208CF628E4EC4EE9682D
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAlU4q4EACgkQLDqVQ9VXb8iWYQCg0nEJGLjm1TSYzXZ4ZQtCG0yh
MM4An1k7NyDhS9rSfHuZsndSj+amv+hN
=nvl7
-----END PGP SIGNATURE-----

--------------enigE4E7208CF628E4EC4EE9682D--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5538AB75.4070401>