Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Mar 2018 15:09:13 +0200
From:      "Kristof Provost" <kristof@sigsegv.be>
To:        "Reshad Patuck" <reshadpatuck1@gmail.com>
Cc:        freebsd-net@freebsd.org, "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>, "Reshad Patuck" <reshad@patuck.net>
Subject:   Re: [vnet] [epair] epair interface stops working after some time
Message-ID:  <10647168-66DF-48CD-9121-9CC2B00848D4@sigsegv.be>
In-Reply-To: <97945712-B53E-4CF6-B20E-6001CF40CDFC@gmail.com>
References:  <CADaJeD2LZy=RU0vtqD7%2BdkZkUs0GKW%2B7duGDQkZ19GR-_cS=MQ@mail.gmail.com> <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be> <1563563.7DUcjoHYMp@reshadlaptop.patuck.net> <C162AFB2-FF80-4640-BDC8-23B30CC22873@sigsegv.be> <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be> <AB52ED81-F97F-471B-A1BA-F3221152A586@patuck.net> <F382A5B4-6941-43C0-9686-4B108034EBF1@patuck.net> <FDCE9FAA-1289-4E15-9239-1B6FD98B589C@sigsegv.be> <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net> <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be> <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net> <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be> <2D15ABDE-0C25-4C97-AEA6-0098459A2795@lists.zabbadoz.net> <CEB5C82A-33AA-4F8A-9A80-EC9CBE0300C3@gmail.com> <277350C5-3B1F-4105-AF0A-886B6133218E@sigsegv.be> <97945712-B53E-4CF6-B20E-6001CF40CDFC@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29 Mar 2018, at 14:48, Reshad Patuck wrote:
> pulling the 'net.link.epair.netisr_maxqlen' down does seem to make 
> this occur faster.
> ​
Good, I think my hypothesis about where the issue lies is correct then.
You should be able to avoid (or at least reduce the frequency of) the 
issue by increasing the value on your system(s).

> When I dropped it to 2 like Kristof did and I have the same symptoms 
> on a box which was not exhibiting the problems manually began to have 
> the same symptoms.
> Bumping it back up to 2100 did not restore the functionality (I don't 
> know if it should).
> ​
It’s good to know this. It doesn’t surprise me that it doesn’t fix 
things.
Something’s wrong in the code which handle an overflow of the netisr 
queue in the epair driver. Once that happens the IFF_DRV_OACTIVE flag 
gets set, and we keep enqueuing outside the netisr queue.
Somehow we never end up back in epair_nh_drainedcpu(), so the flag never 
gets cleared and the driver never recovers.

> I will create a PR for this later today with all the information I 
> have gathered so that we can have it all in one place.
>
Thanks. Please cc me on it. I’ll see if I can figure out what the 
problem is, but we might need someone smarter, so cc Bjoern too.

Regards,
Kristof




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?10647168-66DF-48CD-9121-9CC2B00848D4>