Date: Sun, 21 Jul 2019 16:32:04 -0400 From: Patrick Kelsey <pkelsey@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-net@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: vmx0: watchdog timeout on queue 2, no interrupts on BSP Message-ID: <F89598D2-FEBD-4857-9734-350A077DF4C0@freebsd.org> In-Reply-To: <dfb182e0-7512-cd48-142b-b98dfa4d3525@FreeBSD.org> References: <9c509f7b-8294-d2fe-ea3e-f10fd51f5736@FreeBSD.org> <CAD44qMUA_-vT7-374WGZH1bUFCA-sVo_UHi1uQjKkgpk9358bA@mail.gmail.com> <dfb182e0-7512-cd48-142b-b98dfa4d3525@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Jul 21, 2019, at 4:17 PM, Andriy Gapon <avg@freebsd.org> wrote: >=20 >> On 20/07/2019 20:08, Patrick Kelsey wrote: >>=20 >>=20 >> On Fri, Jul 19, 2019 at 10:07 AM Andriy Gapon <avg@freebsd.org >> <mailto:avg@freebsd.org>> wrote: >>=20 >>=20 >> Recently we experienced a strange problem. >> We noticed a lot of these messages in the logs: >> vmx0: watchdog timeout on queue 2 >> (always queue 2) >> Also, we noticed that connections to some end points did not work at a= ll >> while others worked without problems. I assume that that was because >> specific flows got assigned to that queue 2. >>=20 >> Further investigation has shown that none of interrupts assigned to th= e >> BSP has ever fired (since boot, of course). That included vmx0:rx2 an= d >> vmx0:tx2. But also interrupts for other drivers as well. >>=20 >> Trying to get more information I rebooted the system and the problem >> disappeared. >>=20 >> Has anyone seen anything like that? >> Any thoughts on possible causes? >> Any suggestions what to check if/when the problem reoccurs? >>=20 >> Thanks! >>=20 >>=20 >> If you are running head at or after r347221 or stable/12 at or after >> r349112, then this could be due to >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D239118 (see Comment 4= >> - short story is that an iflib change has broken the vmx driver). >=20 > I am not sure if that bug could lead to all interrupts on the core > getting disabled (for all drivers), and right at the boot time. I am not sure either, but it=E2=80=99s the kind of bug that breaks the desig= n of the vmx driver in such a way that its state can get corrupted to the po= int where the kernel can panic. I haven=E2=80=99t fully analyzed the potent= ial scope of memory corruption / hardware state corruption that can occur (b= ecause the fix for the issue is already apparent), so I am freely considerin= g it to include elements beyond the device and driver itself. If you are saying that zero vmx queue interrupts have occurred anywhere in t= he system, then I would rule out any connection to this as a prerequisite fo= r the corruption to occur is having at least one such interrupt. -Patrick=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F89598D2-FEBD-4857-9734-350A077DF4C0>