Date: Thu, 29 Mar 2018 15:09:13 +0200 From: "Kristof Provost" <kristof@sigsegv.be> To: "Reshad Patuck" <reshadpatuck1@gmail.com> Cc: freebsd-net@freebsd.org, "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>, "Reshad Patuck" <reshad@patuck.net> Subject: Re: [vnet] [epair] epair interface stops working after some time Message-ID: <10647168-66DF-48CD-9121-9CC2B00848D4@sigsegv.be> In-Reply-To: <97945712-B53E-4CF6-B20E-6001CF40CDFC@gmail.com> References: <CADaJeD2LZy=RU0vtqD7%2BdkZkUs0GKW%2B7duGDQkZ19GR-_cS=MQ@mail.gmail.com> <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be> <1563563.7DUcjoHYMp@reshadlaptop.patuck.net> <C162AFB2-FF80-4640-BDC8-23B30CC22873@sigsegv.be> <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be> <AB52ED81-F97F-471B-A1BA-F3221152A586@patuck.net> <F382A5B4-6941-43C0-9686-4B108034EBF1@patuck.net> <FDCE9FAA-1289-4E15-9239-1B6FD98B589C@sigsegv.be> <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net> <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be> <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net> <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be> <2D15ABDE-0C25-4C97-AEA6-0098459A2795@lists.zabbadoz.net> <CEB5C82A-33AA-4F8A-9A80-EC9CBE0300C3@gmail.com> <277350C5-3B1F-4105-AF0A-886B6133218E@sigsegv.be> <97945712-B53E-4CF6-B20E-6001CF40CDFC@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 29 Mar 2018, at 14:48, Reshad Patuck wrote: > pulling the 'net.link.epair.netisr_maxqlen' down does seem to make > this occur faster. > Good, I think my hypothesis about where the issue lies is correct then. You should be able to avoid (or at least reduce the frequency of) the issue by increasing the value on your system(s). > When I dropped it to 2 like Kristof did and I have the same symptoms > on a box which was not exhibiting the problems manually began to have > the same symptoms. > Bumping it back up to 2100 did not restore the functionality (I don't > know if it should). > It’s good to know this. It doesn’t surprise me that it doesn’t fix things. Something’s wrong in the code which handle an overflow of the netisr queue in the epair driver. Once that happens the IFF_DRV_OACTIVE flag gets set, and we keep enqueuing outside the netisr queue. Somehow we never end up back in epair_nh_drainedcpu(), so the flag never gets cleared and the driver never recovers. > I will create a PR for this later today with all the information I > have gathered so that we can have it all in one place. > Thanks. Please cc me on it. I’ll see if I can figure out what the problem is, but we might need someone smarter, so cc Bjoern too. Regards, Kristof
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?10647168-66DF-48CD-9121-9CC2B00848D4>