From owner-freebsd-net@freebsd.org Tue Mar 27 14:40:42 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 70D19F5B331 for ; Tue, 27 Mar 2018 14:40:42 +0000 (UTC) (envelope-from srs0=711/=gr=sigsegv.be=kristof@codepro.be) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DB00B68F12 for ; Tue, 27 Mar 2018 14:40:41 +0000 (UTC) (envelope-from srs0=711/=gr=sigsegv.be=kristof@codepro.be) Received: from [192.168.228.1] (ptr-8ripyyegu1indwts572.18120a2.ip6.access.telenet.be [IPv6:2a02:1811:2419:4e02:491:5406:a491:564e]) (Authenticated sender: kp) by venus.codepro.be (Postfix) with ESMTPSA id 8E6A55551F; Tue, 27 Mar 2018 16:40:39 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sigsegv.be; s=mail; t=1522161639; bh=rX+SeaaHwO+O1/z8ucJvnMEOyWaJYPVL4Kq72OACREE=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=wWUZHQYEPRyr/3rV2AuR6GWTwMMyw2J74QBqF8PGr6cewq9Z7wouyNhidvLm+BTaA Rl4hCfNFrnRgC1OEK+1NxFyuwoyEfrpmVvRM4gxGJWnG+GfEYqBo4VKTPpBRQvGwXt jKnqri/avfDnYm1d1Iai5oouKhOvQgoLfUJnFyzk= From: "Kristof Provost" To: "Reshad Patuck" Cc: "FreeBSD Net" Subject: Re: [vnet] [epair] epair interface stops working after some time Date: Tue, 27 Mar 2018 16:40:37 +0200 X-Mailer: MailMate (2.0BETAr6106) Message-ID: <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be> In-Reply-To: <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net> References: <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be> <1563563.7DUcjoHYMp@reshadlaptop.patuck.net> <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be> <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net> <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be> <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Mar 2018 14:40:42 -0000 (Re-cc freebsd-net, because this is useful information) On 27 Mar 2018, at 13:07, Reshad Patuck wrote: > The epair crash occurred again today running the epair module code > with the added dtrace sdt providers. > ​ > Running the same command as last time, 'dtrace -n ::epair\*:' returns > the following: > ``` > CPU ID FUNCTION:NAME … > 0 66499 epair_transmit_locked:enqueued > ``` > Looks like its filled up a queue somewhere and is dropping connections > post that. > ​ > The value of the 'error' is 55 I can see both the ifp and m structs > but don't know what to look for in them. > That’s useful. Error 55 is ENOBUFS, which in IFQ_ENQUEUE() means we’re hitting _IF_QFULL(). There don’t seem to be counters for that drop though, so that makes it hard to diagnose without these extra probe points. It also explains why you don’t really see any drop counters incrementing. The fact that this queue is full presumably means that the other side is not reading packets off it any more. That’s supposed to happen in epair_start_locked() (Look for the IFQ_DEQUEUE() calls). It’s not at all clear to my how, but it looks like the receive side is not doing its work. It looks like the IFQ code is already a fallback for when the netisr queue is full. That code might be broken, or there might be a different issue that will just mean you’ll always end up in the same situation, regardless of queue size. It’s probably worth trying to play with ‘net.route.netisr_maxqlen’. I’d recommend *lowering* it, to see if the problem happens more frequently that way. If it does it’ll be helpful in reproducing and trying to fix this. If it doesn’t the full queues is probably a consequence rather than a cause/trigger. (Of course, once you’ve confirmed that lowering the netisr_maxqlen makes the problem more frequent go ahead and increase it.) Regards, Kristof From owner-freebsd-net@freebsd.org Tue Mar 27 14:47:13 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 812EFF5BF1F for ; Tue, 27 Mar 2018 14:47:13 +0000 (UTC) (envelope-from srs0=711/=gr=sigsegv.be=kristof@codepro.be) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9C6F46977F for ; Tue, 27 Mar 2018 14:47:12 +0000 (UTC) (envelope-from srs0=711/=gr=sigsegv.be=kristof@codepro.be) Received: from [192.168.228.1] (ptr-8ripyyegu1indwts572.18120a2.ip6.access.telenet.be [IPv6:2a02:1811:2419:4e02:491:5406:a491:564e]) (Authenticated sender: kp) by venus.codepro.be (Postfix) with ESMTPSA id 5C44955527; Tue, 27 Mar 2018 16:47:11 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sigsegv.be; s=mail; t=1522162031; bh=Z/PtKz+btIKTfei3Gew/NFLFRTXQMAa1SWoxUBMik+8=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=jVatzWLb3eHZTtwwOjEOZSd5ZOTHMmr++SqBaEGcgWAv6po0Vemat23yUYmtTobhc DamL0uYydIywWiwEd99grxT5tAN07jHP/Fij30o8TZr3Ts2wuehj5z7PKEiHnMo5VN wbW2onY0q86vw1i0N/eRmBSNlWjklJCMcTxoFu/I= From: "Kristof Provost" To: "Reshad Patuck" Cc: "FreeBSD Net" Subject: Re: [vnet] [epair] epair interface stops working after some time Date: Tue, 27 Mar 2018 16:47:10 +0200 X-Mailer: MailMate (2.0BETAr6106) Message-ID: <1DA1D7BE-015D-4B42-A7A8-13FE837BA6DE@sigsegv.be> In-Reply-To: <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be> References: <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be> <1563563.7DUcjoHYMp@reshadlaptop.patuck.net> <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be> <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net> <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be> <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net> <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Mar 2018 14:47:13 -0000 On 27 Mar 2018, at 16:40, Kristof Provost wrote: > It’s probably worth trying to play with ‘net.route.netisr_maxqlen’. I probably mean ‘net.link.epair.netisr_maxqlen’ here. Regards, Kristof