From owner-freebsd-net@freebsd.org  Tue Mar 27 14:48:37 2018
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44081F5C191
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Tue, 27 Mar 2018 14:48:37 +0000 (UTC)
 (envelope-from bzeeb-lists@lists.zabbadoz.net)
Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:130:3ffc::401:25])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client CN "mx1.sbone.de", Issuer "SBone.DE" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id CBB06698AC
 for <freebsd-net@freebsd.org>; Tue, 27 Mar 2018 14:48:36 +0000 (UTC)
 (envelope-from bzeeb-lists@lists.zabbadoz.net)
Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.sbone.de (Postfix) with ESMTPS id B8E3425D3A6E;
 Tue, 27 Mar 2018 14:48:34 +0000 (UTC)
Received: from content-filter.sbone.de (content-filter.sbone.de
 [IPv6:fde9:577b:c1a9:31::2013:2742])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.sbone.de (Postfix) with ESMTPS id F2299D1F904;
 Tue, 27 Mar 2018 14:48:33 +0000 (UTC)
X-Virus-Scanned: amavisd-new at sbone.de
Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587])
 by content-filter.sbone.de (content-filter.sbone.de
 [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024)
 with ESMTP id tMLHDd0FivuW; Tue, 27 Mar 2018 14:48:32 +0000 (UTC)
Received: from [192.168.1.88] (fresh-ayiya.sbone.de
 [IPv6:fde9:577b:c1a9:f001::2])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.sbone.de (Postfix) with ESMTPSA id 8B805D1F902;
 Tue, 27 Mar 2018 14:48:31 +0000 (UTC)
From: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
To: "Kristof Provost" <kristof@sigsegv.be>
Cc: "Reshad Patuck" <reshad@patuck.net>,
 "FreeBSD Net" <freebsd-net@freebsd.org>
Subject: Re: [vnet] [epair] epair interface stops working after some time
Date: Tue, 27 Mar 2018 14:48:29 +0000
X-Mailer: MailMate (2.0BETAr6106)
Message-ID: <2D15ABDE-0C25-4C97-AEA6-0098459A2795@lists.zabbadoz.net>
In-Reply-To: <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be>
References: <CADaJeD2LZy=RU0vtqD7+dkZkUs0GKW+7duGDQkZ19GR-_cS=MQ@mail.gmail.com>
 <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be>
 <1563563.7DUcjoHYMp@reshadlaptop.patuck.net>
 <C162AFB2-FF80-4640-BDC8-23B30CC22873@sigsegv.be>
 <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be>
 <AB52ED81-F97F-471B-A1BA-F3221152A586@patuck.net>
 <F382A5B4-6941-43C0-9686-4B108034EBF1@patuck.net>
 <FDCE9FAA-1289-4E15-9239-1B6FD98B589C@sigsegv.be>
 <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net>
 <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be>
 <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net>
 <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Mar 2018 14:48:37 -0000

On 27 Mar 2018, at 14:40, Kristof Provost wrote:

> (Re-cc freebsd-net, because this is useful information)
>
> On 27 Mar 2018, at 13:07, Reshad Patuck wrote:
>> The epair crash occurred again today running the epair module code 
>> with the added dtrace sdt providers.
>> ​
>> Running the same command as last time, 'dtrace -n ::epair\*:' returns 
>> the following:
>> ```
>> CPU     ID                    FUNCTION:NAME
> …
>>   0  66499   epair_transmit_locked:enqueued
>> ```
>
>> Looks like its filled up a queue somewhere and is dropping 
>> connections post that.
>> ​
>> The value of the 'error' is 55 I can see both the ifp and m structs 
>> but don't know what to look for in them.
>>
> That’s useful. Error 55 is ENOBUFS, which in IFQ_ENQUEUE() means 
> we’re hitting _IF_QFULL().
> There don’t seem to be counters for that drop though, so that makes 
> it hard to diagnose without these extra probe points.
> It also explains why you don’t really see any drop counters 
> incrementing.
>
> The fact that this queue is full presumably means that the other side 
> is not reading packets off it any more.
> That’s supposed to happen in epair_start_locked() (Look for the 
> IFQ_DEQUEUE() calls).
>
> It’s not at all clear to my how, but it looks like the receive side 
> is not doing its work.
>
> It looks like the IFQ code is already a fallback for when the netisr 
> queue is full.
> That code might be broken, or there might be a different issue that 
> will just mean you’ll always end up in the same situation, 
> regardless of queue size.
>
> It’s probably worth trying to play with 
> ‘net.route.netisr_maxqlen’. I’d recommend *lowering* it, to see 
> if the problem happens more frequently that way. If it does it’ll be 
> helpful in reproducing and trying to fix this. If it doesn’t the 
> full queues is probably a consequence rather than a cause/trigger.
> (Of course, once you’ve confirmed that lowering the netisr_maxqlen 
> makes the problem more frequent go ahead and increase it.)

netstat -Q  will be useful