From nobody Sat Mar 12 01:10:45 2022 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9D1A51A1A0B4 for ; Sat, 12 Mar 2022 01:10:49 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KFl9P42Gtz4mGh; Sat, 12 Mar 2022 01:10:49 +0000 (UTC) (envelope-from kp@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1647047449; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=30J+NetKLhpNSbAH6qqK4/77t27elaQto0bo31/K0Qs=; b=fOsB8tn0yaXejinjxxfXu7vM8jQvWq7FjkIVimEtVnziwJXJQ78x/p0HNN6njpuvE7Lcl9 uzbe5gBTsUHHp9bWAXRx4pxnSAsCM69lmc7nl+a5y1z8dyAhppJC2CB603/CIXyccfXeAW paAlzhKRg78+/JWTZouPmaxT5QnSDQtb5zK1OSCTP2RFW30gzVga4L/+UarVffrDZ7cx0n pkh9AGTrfnyZU94cn+r8EaMuyf+kfBMiBGH94qskw3IoXQkTPBdL7UNSv02rU2nTlrEPLc sj5itbQQ9EvZ1oC7tyNa87ia87AUHp029+uNK7DahER1MyHRNdeSAvCrz6mQeA== Received: from venus.codepro.be (venus.codepro.be [5.9.86.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.codepro.be", Issuer "R3" (verified OK)) (Authenticated sender: kp) by smtp.freebsd.org (Postfix) with ESMTPSA id 502DA1F22; Sat, 12 Mar 2022 01:10:49 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: by venus.codepro.be (Postfix, authenticated sender kp) id 6CC8818D8D; Sat, 12 Mar 2022 02:10:47 +0100 (CET) From: Kristof Provost To: Michael Gmelin Cc: Johan Hendriks , freebsd-net@freebsd.org, ">> \\\\\\\\Patrick M. Hausen\\\\" Subject: Re: epair and vnet jail loose connection. Date: Fri, 11 Mar 2022 19:10:45 -0600 X-Mailer: MailMate (1.14r5852) Message-ID: In-Reply-To: <43AA6B37-6235-4787-A03F-B4C264C75A58@freebsd.org> References: <41ED1534-5E98-4D46-A562-811E80F82C5F@FreeBSD.org> <43AA6B37-6235-4787-A03F-B4C264C75A58@freebsd.org> List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1647047449; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=30J+NetKLhpNSbAH6qqK4/77t27elaQto0bo31/K0Qs=; b=mUGub72WA1ra8P7QrflzTyusXYWkMWRWHu/tQzvCTW60SkZUDRw46N3GdSM47VhjYvIsyH Fq/CPHR+k4L9oyj7r4luY3NkUe3k9lM5XTMPwrvgiepHqd/KGc5aECFfOZFEBtMusTlQk7 Gxk+hQHdfBkGzsuBiC00DGfu2egmOqUj6L70UFvu1AoKZslx41FgL0Ze1mP0MzJTNVg7Vn 0a4oohce1bnsu4FwEZnFepe87K1TwKrw43QH9bhldVuvNsG7FdCWjgLXOTO/WvQlTDXLCL Hydv8RICuEnfSyWDCX1BOHckJe78iuVUK0Zm4/8h3CyZq7oMoVCw4aVMDoYM9w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1647047449; a=rsa-sha256; cv=none; b=byeCeTVtbhFkiMF522kyEZ0cFDuLE94vLn+A78dOodGriU7OkquO9wLdrCoSXgOxL3NT5H jykW1WJX98YFlcc8whNGrPgwovH6E0MUxTWmdnhrahHe1WVWu6cvRCxfWCzwgCcuSwxt08 kc2zj1Qt6lnh7h1zD5/UZIwhSWeeszd6VJwE92vbrBIbPW2Rsoh8HNR04i5KuMVsMBf4Tg kGgwe85opv4dtXBb4HOOQM2pXwCoYjXASB/2uqXqVnZ/+iHwM5jdu2dQJAchBicAjtZnBA RdbCPwjA5IuMO310Z4K91YP7v6+cgMrNcO2JYJJPjkuXgyTXgOOBs7WiA9rrag== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N On 11 Mar 2022, at 18:55, Michael Gmelin wrote: >> On 12. Mar 2022, at 01:21, Kristof Provost wrote: >> >> =EF=BB=BFOn 11 Mar 2022, at 17:44, Johan Hendriks wrote: >>>> On 09/03/2022 20:55, Johan Hendriks wrote: >>>> The problem: >>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both ru= nning the same jails just to test the workings. >>>> >>>> The jails that are running are a salt master, a haproxy jail, 2 web= servers, 2 varnish servers, 2 php jails one for php8.0 and one with 8.1. = All the jails are connected to bridge0 and all the jails use vnet. >>>> >>>> I believe this worked on an older 14-HEAD machine, but i did not do = a lot with it back then, and when i started testing again and after updat= ing the OS i noticed that one of the varnish jails lost it's network conn= ection after running for a few hours. I thought it was just something on = HEAD so never really looked at it. But later on when i start using the ja= ils again and testing a test wordpress site i noticed that with a simple = load test my haproxy jail within one minute looses it's network connectio= n. I see nothing in the logs, on the host and on the jail. >>>> From the jail i can not ping the other jails or the IP adres of the = bridge. I can however ping the jails own IP adres. From the host i can al= so not ping the haproxy jail IP adres. If i start a tcpdump on the epaira= interface from the haproxy jail i do see the packets arrive but not in t= he jail. >>>> >>>> I used ZFS to send all the jails to a 13-STABLE machine and copied o= ver the jail.conf file as well as the pf.conf file and i saw the same beh= avior. >>>> >>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not see= this happening. There i can stress test the machine for 10 minutes witho= ut a problem but on 14-HEAD and 13-STABLE within a minute the jail's netw= ork connection fails and only a restart of the jail brings it back online= to exhibit the same behavior if i start a simple load test which it shou= ld handle nicely. >>>> >>>> One of the jail hosts is running under VMWARE and the other is runni= ng under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running under = Ubuntu with KVM >>>> >>>> Thank you for your time. >>>> regards >>>> Johan >>>> >>> I did some bisecting and the latest commit that works on FreeBSD 13-S= table is 009a56b2e >>> Then the commit 2e0bee4c7 if_epair: implement fanout and above is sh= owing the symptoms described above. >>> >> Interestingly I cannot reproduce stalls in simple epair setups. >> It would be useful if you could reduce the setup with the problem into= a minimal configuration so we can figure out what other factors are invo= lved. > > If there are clear instructions on how to reproduce, I=E2=80=99m happy = to help experimenting (I=E2=80=99m relying heavily on epair at this point= ). > > @Kristof: Did you try on bare metal or on vms? > Both. Kristof