Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Mar 2018 00:29:51 +0530
From:      Reshad Patuck <reshadpatuck1@gmail.com>
To:        freebsd-net@freebsd.org, "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>, Kristof Provost <kristof@sigsegv.be>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>,Reshad Patuck <reshad@patuck.net>
Subject:   Re: [vnet] [epair] epair interface stops working after some time
Message-ID:  <CEB5C82A-33AA-4F8A-9A80-EC9CBE0300C3@gmail.com>
In-Reply-To: <2D15ABDE-0C25-4C97-AEA6-0098459A2795@lists.zabbadoz.net>
References:  <CADaJeD2LZy=RU0vtqD7%2BdkZkUs0GKW%2B7duGDQkZ19GR-_cS=MQ@mail.gmail.com> <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be> <1563563.7DUcjoHYMp@reshadlaptop.patuck.net> <C162AFB2-FF80-4640-BDC8-23B30CC22873@sigsegv.be> <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be> <AB52ED81-F97F-471B-A1BA-F3221152A586@patuck.net> <F382A5B4-6941-43C0-9686-4B108034EBF1@patuck.net> <FDCE9FAA-1289-4E15-9239-1B6FD98B589C@sigsegv.be> <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net> <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be> <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net> <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be> <2D15ABDE-0C25-4C97-AEA6-0098459A2795@lists.zabbadoz.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,
=E2=80=8B
@Kristof:
The current value of 'net=2Elink=2Eepair=2Enetisr_maxqlen' is 2100, I will=
 make it 210=2E
Will this require a reboot? or can I just change the sysctl and reload the=
 epair module?
=E2=80=8B
@Bjoern:
here is the output to 'netstat -Q'
```
# netstat -Q
Configuration:
Setting                        Current        Limit
Thread count                         1            1
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs         disabled          n/a
=E2=80=8B
Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1    256   flow  default   ---
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256 source   direct   ---
ip6        6    256   flow  default   ---
epair      8   2100    cpu  default   CD-
=E2=80=8B
Workstreams:
WSID CPU   Name     Len WMark   Disp'd  HDisp'd   QDrops   Queued  Handled
   0   0   ip         0    30 11409267        0        0 13574317 24983409
   0   0   igmp       0     0        0        0        0        0        0
   0   0   rtsock     0     1        0        0        0       42       42
   0   0   arp        0     0 61109751        0        0        0 61109751
   0   0   ether      0     0 115098020        0        0        0 1150980=
20
   0   0   ip6        0    10 36157577        0        0  4273274 40430846
   0   0   epair      0  2100        0        0   210972 303785724 3037857=
24
```
=E2=80=8B
I still have access to a machine in this state, but will need to reset it =
to a working state soon=2E
=E2=80=8B
Please let me know if there is any information you would like me to get fr=
om this machine before I reset it=2E
=E2=80=8B
Best,
=E2=80=8B
Reshad

On 27 March 2018 8:18:29 PM IST, "Bjoern A=2E Zeeb" <bzeeb-lists@lists=2Ez=
abbadoz=2Enet> wrote:
>On 27 Mar 2018, at 14:40, Kristof Provost wrote:
>
>> (Re-cc freebsd-net, because this is useful information)
>>
>> On 27 Mar 2018, at 13:07, Reshad Patuck wrote:
>>> The epair crash occurred again today running the epair module code=20
>>> with the added dtrace sdt providers=2E
>>> =E2=80=8B
>>> Running the same command as last time, 'dtrace -n ::epair\*:'
>returns=20
>>> the following:
>>> ```
>>> CPU     ID                    FUNCTION:NAME
>> =E2=80=A6
>>>   0  66499   epair_transmit_locked:enqueued
>>> ```
>>
>>> Looks like its filled up a queue somewhere and is dropping=20
>>> connections post that=2E
>>> =E2=80=8B
>>> The value of the 'error' is 55 I can see both the ifp and m structs=20
>>> but don't know what to look for in them=2E
>>>
>> That=E2=80=99s useful=2E Error 55 is ENOBUFS, which in IFQ_ENQUEUE() me=
ans=20
>> we=E2=80=99re hitting _IF_QFULL()=2E
>> There don=E2=80=99t seem to be counters for that drop though, so that m=
akes=20
>> it hard to diagnose without these extra probe points=2E
>> It also explains why you don=E2=80=99t really see any drop counters=20
>> incrementing=2E
>>
>> The fact that this queue is full presumably means that the other side
>
>> is not reading packets off it any more=2E
>> That=E2=80=99s supposed to happen in epair_start_locked() (Look for the=
=20
>> IFQ_DEQUEUE() calls)=2E
>>
>> It=E2=80=99s not at all clear to my how, but it looks like the receive =
side=20
>> is not doing its work=2E
>>
>> It looks like the IFQ code is already a fallback for when the netisr=20
>> queue is full=2E
>> That code might be broken, or there might be a different issue that=20
>> will just mean you=E2=80=99ll always end up in the same situation,=20
>> regardless of queue size=2E
>>
>> It=E2=80=99s probably worth trying to play with=20
>> =E2=80=98net=2Eroute=2Enetisr_maxqlen=E2=80=99=2E I=E2=80=99d recommend=
 *lowering* it, to see=20
>> if the problem happens more frequently that way=2E If it does it=E2=80=
=99ll be=20
>> helpful in reproducing and trying to fix this=2E If it doesn=E2=80=99t =
the=20
>> full queues is probably a consequence rather than a cause/trigger=2E
>> (Of course, once you=E2=80=99ve confirmed that lowering the netisr_maxq=
len=20
>> makes the problem more frequent go ahead and increase it=2E)
>
>netstat -Q  will be useful
>_______________________________________________
>freebsd-net@freebsd=2Eorg mailing list
>https://lists=2Efreebsd=2Eorg/mailman/listinfo/freebsd-net
>To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd=2Eorg"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CEB5C82A-33AA-4F8A-9A80-EC9CBE0300C3>