Date: Tue, 3 Aug 2021 17:27:51 +0200 From: Franco Fichtner <franco@lastsummer.de> To: Kevin Bowling <kevin.bowling@kev009.com> Cc: FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: igb(4) and VLAN issue? Message-ID: <ED4BA1DF-DE8C-4006-9761-5A05A555543C@lastsummer.de> In-Reply-To: <CAK7dMtCJhKVo8agr_VGbtGHZeKK8_8ip%2B6bY_yaW45wo42caZQ@mail.gmail.com> References: <CAK7dMtCJhKVo8agr_VGbtGHZeKK8_8ip%2B6bY_yaW45wo42caZQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Kevin, [RESENT TO MAILING LIST AS SUBSCRIBER] > On 2. Aug 2021, at 7:51 PM, Kevin Bowling <kevin.bowling@kev009.com> = wrote: >=20 > I caught wind that an igb(4) commit I've done to main and that has > been in stable/12 for a few months seems to be causing a regression on > opnsense. The commit in question is > = https://cgit.freebsd.org/src/commit/?id=3Deea55de7b10808b86277d7fdbed2d05d= 3c6db1b2 >=20 > The report is at: > https://forum.opnsense.org/index.php?topic=3D23867.0 Looks like I spoke to soon earlier. This is a weird one for sure. :) So first of all this causes an ifconfig hang for VLAN/LAGG combo = creation, but later reports were coming in about ahci errors and cam timeouts. Some reported the instabilities start with using netmap, but later = others confirmed the same for high load scenarios without netmap in use. The does not appear to happen when MSIX is disabled, e.g.: # sysctl -a | grep dev.igb | grep msix dev.igb.5.iflib.disable_msix: 1 dev.igb.4.iflib.disable_msix: 1 dev.igb.3.iflib.disable_msix: 1 dev.igb.2.iflib.disable_msix: 1 dev.igb.1.iflib.disable_msix: 1 dev.igb.0.iflib.disable_msix: 1 What's also being linked to this is some form of softraid misbehaving and the general tendency for cheaper hardware with particular igb chipsets. > I haven't heard of this issue elsewhere and cannot replicate it on my > I210s running main. I've gone over the code changes line by line > several times and verified all the logic and register writes and it > all looks correct to my understanding. The only hypothesis I have at > the moment is it may be some subtle timing issue since VLAN changes > unnecessarily restart the interface on e1000 until I push in a work in > progress to stop doing that. I also have no way of reproducing this locally, but the community is probably willing to give any kernel change a try that would address the problem without havinbg to back out the commit in question. > I'd like to see the output of all the processes or at least the > process configuring the VLANs to see where it is stuck. Franco, do > you have the ability to 'control+t' there or otherwise set up a break > into a debugger? Stacktraces would be a great start but a core and a > kernel may be necessary if it isn't obvious. Let me see if I can deliver on this easily. Cheers, Franco
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ED4BA1DF-DE8C-4006-9761-5A05A555543C>