Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Jul 2019 16:32:04 -0400
From:      Patrick Kelsey <pkelsey@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-net@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: vmx0: watchdog timeout on queue 2, no interrupts on BSP
Message-ID:  <F89598D2-FEBD-4857-9734-350A077DF4C0@freebsd.org>
In-Reply-To: <dfb182e0-7512-cd48-142b-b98dfa4d3525@FreeBSD.org>
References:  <9c509f7b-8294-d2fe-ea3e-f10fd51f5736@FreeBSD.org> <CAD44qMUA_-vT7-374WGZH1bUFCA-sVo_UHi1uQjKkgpk9358bA@mail.gmail.com> <dfb182e0-7512-cd48-142b-b98dfa4d3525@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Jul 21, 2019, at 4:17 PM, Andriy Gapon <avg@freebsd.org> wrote:
>=20
>> On 20/07/2019 20:08, Patrick Kelsey wrote:
>>=20
>>=20
>> On Fri, Jul 19, 2019 at 10:07 AM Andriy Gapon <avg@freebsd.org
>> <mailto:avg@freebsd.org>> wrote:
>>=20
>>=20
>>    Recently we experienced a strange problem.
>>    We noticed a lot of these messages in the logs:
>>    vmx0: watchdog timeout on queue 2
>>    (always queue 2)
>>    Also, we noticed that connections to some end points did not work at a=
ll
>>    while others worked without problems.  I assume that that was because
>>    specific flows got assigned to that queue 2.
>>=20
>>    Further investigation has shown that none of interrupts assigned to th=
e
>>    BSP has ever fired (since boot, of course).  That included vmx0:rx2 an=
d
>>    vmx0:tx2.  But also interrupts for other drivers as well.
>>=20
>>    Trying to get more information I rebooted the system and the problem
>>    disappeared.
>>=20
>>    Has anyone seen anything like that?
>>    Any thoughts on possible causes?
>>    Any suggestions what to check if/when the problem reoccurs?
>>=20
>>    Thanks!
>>=20
>>=20
>> If you are running head at or after r347221 or stable/12 at or after
>> r349112, then this could be due to
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D239118 (see Comment 4=

>> - short story is that an iflib change has broken the vmx driver).
>=20
> I am not sure if that bug could lead to all interrupts on the core
> getting disabled (for all drivers), and right at the boot time.

I am not sure either, but it=E2=80=99s the kind of bug that breaks the desig=
n of the vmx driver in such a way that its state can get corrupted to the po=
int where the kernel can panic.  I haven=E2=80=99t fully analyzed the potent=
ial scope of memory corruption / hardware state corruption that can occur (b=
ecause the fix for the issue is already apparent), so I am freely considerin=
g it to include elements beyond the device and driver itself.

If you are saying that zero vmx queue interrupts have occurred anywhere in t=
he system, then I would rule out any connection to this as a prerequisite fo=
r the corruption to occur is having at least one such interrupt.

-Patrick=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F89598D2-FEBD-4857-9734-350A077DF4C0>