Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Mar 2018 16:40:37 +0200
From:      "Kristof Provost" <kristof@sigsegv.be>
To:        "Reshad Patuck" <reshad@patuck.net>
Cc:        "FreeBSD Net" <freebsd-net@freebsd.org>
Subject:   Re: [vnet] [epair] epair interface stops working after some time
Message-ID:  <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be>
In-Reply-To: <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net>
References:  <CADaJeD2LZy=RU0vtqD7%2BdkZkUs0GKW%2B7duGDQkZ19GR-_cS=MQ@mail.gmail.com> <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be> <1563563.7DUcjoHYMp@reshadlaptop.patuck.net> <C162AFB2-FF80-4640-BDC8-23B30CC22873@sigsegv.be> <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be> <AB52ED81-F97F-471B-A1BA-F3221152A586@patuck.net> <F382A5B4-6941-43C0-9686-4B108034EBF1@patuck.net> <FDCE9FAA-1289-4E15-9239-1B6FD98B589C@sigsegv.be> <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net> <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be> <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net>

next in thread | previous in thread | raw e-mail | index | archive | help
(Re-cc freebsd-net, because this is useful information)

On 27 Mar 2018, at 13:07, Reshad Patuck wrote:
> The epair crash occurred again today running the epair module code 
> with the added dtrace sdt providers.
> ​
> Running the same command as last time, 'dtrace -n ::epair\*:' returns 
> the following:
> ```
> CPU     ID                    FUNCTION:NAME
…
>   0  66499   epair_transmit_locked:enqueued
> ```

> Looks like its filled up a queue somewhere and is dropping connections 
> post that.
> ​
> The value of the 'error' is 55 I can see both the ifp and m structs 
> but don't know what to look for in them.
>
That’s useful. Error 55 is ENOBUFS, which in IFQ_ENQUEUE() means 
we’re hitting _IF_QFULL().
There don’t seem to be counters for that drop though, so that makes it 
hard to diagnose without these extra probe points.
It also explains why you don’t really see any drop counters 
incrementing.

The fact that this queue is full presumably means that the other side is 
not reading packets off it any more.
That’s supposed to happen in epair_start_locked() (Look for the 
IFQ_DEQUEUE() calls).

It’s not at all clear to my how, but it looks like the receive side is 
not doing its work.

It looks like the IFQ code is already a fallback for when the netisr 
queue is full.
That code might be broken, or there might be a different issue that will 
just mean you’ll always end up in the same situation, regardless of 
queue size.

It’s probably worth trying to play with 
‘net.route.netisr_maxqlen’. I’d recommend *lowering* it, to see if 
the problem happens more frequently that way. If it does it’ll be 
helpful in reproducing and trying to fix this. If it doesn’t the full 
queues is probably a consequence rather than a cause/trigger.
(Of course, once you’ve confirmed that lowering the netisr_maxqlen 
makes the problem more frequent go ahead and increase it.)

Regards,
Kristof
From owner-freebsd-net@freebsd.org  Tue Mar 27 14:47:13 2018
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 812EFF5BF1F
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Tue, 27 Mar 2018 14:47:13 +0000 (UTC)
 (envelope-from srs0=711/=gr=sigsegv.be=kristof@codepro.be)
Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits))
 (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA 2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9C6F46977F
 for <freebsd-net@freebsd.org>; Tue, 27 Mar 2018 14:47:12 +0000 (UTC)
 (envelope-from srs0=711/=gr=sigsegv.be=kristof@codepro.be)
Received: from [192.168.228.1]
 (ptr-8ripyyegu1indwts572.18120a2.ip6.access.telenet.be
 [IPv6:2a02:1811:2419:4e02:491:5406:a491:564e])
 (Authenticated sender: kp)
 by venus.codepro.be (Postfix) with ESMTPSA id 5C44955527;
 Tue, 27 Mar 2018 16:47:11 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sigsegv.be; s=mail;
 t=1522162031; bh=Z/PtKz+btIKTfei3Gew/NFLFRTXQMAa1SWoxUBMik+8=;
 h=From:To:Cc:Subject:Date:In-Reply-To:References;
 b=jVatzWLb3eHZTtwwOjEOZSd5ZOTHMmr++SqBaEGcgWAv6po0Vemat23yUYmtTobhc
 DamL0uYydIywWiwEd99grxT5tAN07jHP/Fij30o8TZr3Ts2wuehj5z7PKEiHnMo5VN
 wbW2onY0q86vw1i0N/eRmBSNlWjklJCMcTxoFu/I=
From: "Kristof Provost" <kristof@sigsegv.be>
To: "Reshad Patuck" <reshad@patuck.net>
Cc: "FreeBSD Net" <freebsd-net@freebsd.org>
Subject: Re: [vnet] [epair] epair interface stops working after some time
Date: Tue, 27 Mar 2018 16:47:10 +0200
X-Mailer: MailMate (2.0BETAr6106)
Message-ID: <1DA1D7BE-015D-4B42-A7A8-13FE837BA6DE@sigsegv.be>
In-Reply-To: <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be>
References: <CADaJeD2LZy=RU0vtqD7+dkZkUs0GKW+7duGDQkZ19GR-_cS=MQ@mail.gmail.com>
 <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be>
 <1563563.7DUcjoHYMp@reshadlaptop.patuck.net>
 <C162AFB2-FF80-4640-BDC8-23B30CC22873@sigsegv.be>
 <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be>
 <AB52ED81-F97F-471B-A1BA-F3221152A586@patuck.net>
 <F382A5B4-6941-43C0-9686-4B108034EBF1@patuck.net>
 <FDCE9FAA-1289-4E15-9239-1B6FD98B589C@sigsegv.be>
 <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net>
 <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be>
 <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net>
 <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>;
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Mar 2018 14:47:13 -0000

On 27 Mar 2018, at 16:40, Kristof Provost wrote:
> It’s probably worth trying to play with ‘net.route.netisr_maxqlen’.
I probably mean ‘net.link.epair.netisr_maxqlen’ here.

Regards,
Kristof



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7202AFF2-A314-41FE-BD13-C4C77A95E106>