Date: Thu, 23 Sep 2021 15:46:37 -0700 From: Kevin Bowling <kevin.bowling@kev009.com> To: Franco Fichtner <franco@lastsummer.de> Cc: FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: igb(4) and VLAN issue? Message-ID: <CAK7dMtDNZ-dFL=NsZHUisG_wLT8RSLJomds6kC1PkVUkPFZr%2BQ@mail.gmail.com> In-Reply-To: <CAK7dMtBdRFuQTv5LRMTiDy9VVef1pVKgdL40_FrVhXjpkptqUw@mail.gmail.com> References: <CAK7dMtCJhKVo8agr_VGbtGHZeKK8_8ip%2B6bY_yaW45wo42caZQ@mail.gmail.com> <ED4BA1DF-DE8C-4006-9761-5A05A555543C@lastsummer.de> <CAK7dMtBdRFuQTv5LRMTiDy9VVef1pVKgdL40_FrVhXjpkptqUw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Franco, I think I found it: https://reviews.freebsd.org/D32087 Regards, Kevin On Tue, Aug 3, 2021 at 8:50 AM Kevin Bowling <kevin.bowling@kev009.com> wrote: > > On Tue, Aug 3, 2021 at 8:27 AM Franco Fichtner <franco@lastsummer.de> wrote: > > > > Hi Kevin, > > > > [RESENT TO MAILING LIST AS SUBSCRIBER] > > > > > On 2. Aug 2021, at 7:51 PM, Kevin Bowling <kevin.bowling@kev009.com> wrote: > > > > > > I caught wind that an igb(4) commit I've done to main and that has > > > been in stable/12 for a few months seems to be causing a regression on > > > opnsense. The commit in question is > > > https://cgit.freebsd.org/src/commit/?id=eea55de7b10808b86277d7fdbed2d05d3c6db1b2 > > > > > > The report is at: > > > https://forum.opnsense.org/index.php?topic=23867.0 > > > > Looks like I spoke to soon earlier. This is a weird one for sure. :) > > > > So first of all this causes an ifconfig hang for VLAN/LAGG combo creation, > > but later reports were coming in about ahci errors and cam timeouts. > > Some reported the instabilities start with using netmap, but later others > > confirmed the same for high load scenarios without netmap in use. > > > > The does not appear to happen when MSIX is disabled, e.g.: > > > > # sysctl -a | grep dev.igb | grep msix > > dev.igb.5.iflib.disable_msix: 1 > > dev.igb.4.iflib.disable_msix: 1 > > dev.igb.3.iflib.disable_msix: 1 > > dev.igb.2.iflib.disable_msix: 1 > > dev.igb.1.iflib.disable_msix: 1 > > dev.igb.0.iflib.disable_msix: 1 > > > > What's also being linked to this is some form of softraid misbehaving > > and the general tendency for cheaper hardware with particular igb > > chipsets. > > Hmm, there is so much that /could/ be going on it's not easy to > pinpoint anything yet. If nothing jumps out after getting more data > it may be worth mitigating in your build that way and retrying once > you have updated to FreeBSD 13. > > > > I haven't heard of this issue elsewhere and cannot replicate it on my > > > I210s running main. I've gone over the code changes line by line > > > several times and verified all the logic and register writes and it > > > all looks correct to my understanding. The only hypothesis I have at > > > the moment is it may be some subtle timing issue since VLAN changes > > > unnecessarily restart the interface on e1000 until I push in a work in > > > progress to stop doing that. > > > > I also have no way of reproducing this locally, but the community is > > probably willing to give any kernel change a try that would address > > the problem without havinbg to back out the commit in question. > > I need some more info before making any changes. A full dmesg of the > older working version and a (partial?) dmesg of the broken would be > another useful data point to start out with, let's see if there is > something going on during MSI-X vector allocation etc. > > > > I'd like to see the output of all the processes or at least the > > > process configuring the VLANs to see where it is stuck. Franco, do > > > you have the ability to 'control+t' there or otherwise set up a break > > > into a debugger? Stacktraces would be a great start but a core and a > > > kernel may be necessary if it isn't obvious. > > > > Let me see if I can deliver on this easily. > > > > > > Cheers, > > Franco > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAK7dMtDNZ-dFL=NsZHUisG_wLT8RSLJomds6kC1PkVUkPFZr%2BQ>