Date: Fri, 11 Mar 2022 19:10:45 -0600 From: Kristof Provost <kp@FreeBSD.org> To: Michael Gmelin <grembo@freebsd.org> Cc: Johan Hendriks <joh.hendriks@gmail.com>, freebsd-net@freebsd.org, ">> \\\\\\\\Patrick M. Hausen\\\\" <hausen@punkt.de> Subject: Re: epair and vnet jail loose connection. Message-ID: <B3094CE7-4869-4CF2-853D-F70E84B28914@FreeBSD.org> In-Reply-To: <43AA6B37-6235-4787-A03F-B4C264C75A58@freebsd.org> References: <41ED1534-5E98-4D46-A562-811E80F82C5F@FreeBSD.org> <43AA6B37-6235-4787-A03F-B4C264C75A58@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11 Mar 2022, at 18:55, Michael Gmelin wrote: >> On 12. Mar 2022, at 01:21, Kristof Provost <kp@freebsd.org> wrote: >> >> =EF=BB=BFOn 11 Mar 2022, at 17:44, Johan Hendriks wrote: >>>> On 09/03/2022 20:55, Johan Hendriks wrote: >>>> The problem: >>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both ru= nning the same jails just to test the workings. >>>> >>>> The jails that are running are a salt master, a haproxy jail, 2 web= servers, 2 varnish servers, 2 php jails one for php8.0 and one with 8.1. = All the jails are connected to bridge0 and all the jails use vnet. >>>> >>>> I believe this worked on an older 14-HEAD machine, but i did not do = a lot with it back then, and when i started testing again and after updat= ing the OS i noticed that one of the varnish jails lost it's network conn= ection after running for a few hours. I thought it was just something on = HEAD so never really looked at it. But later on when i start using the ja= ils again and testing a test wordpress site i noticed that with a simple = load test my haproxy jail within one minute looses it's network connectio= n. I see nothing in the logs, on the host and on the jail. >>>> From the jail i can not ping the other jails or the IP adres of the = bridge. I can however ping the jails own IP adres. From the host i can al= so not ping the haproxy jail IP adres. If i start a tcpdump on the epaira= interface from the haproxy jail i do see the packets arrive but not in t= he jail. >>>> >>>> I used ZFS to send all the jails to a 13-STABLE machine and copied o= ver the jail.conf file as well as the pf.conf file and i saw the same beh= avior. >>>> >>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not see= this happening. There i can stress test the machine for 10 minutes witho= ut a problem but on 14-HEAD and 13-STABLE within a minute the jail's netw= ork connection fails and only a restart of the jail brings it back online= to exhibit the same behavior if i start a simple load test which it shou= ld handle nicely. >>>> >>>> One of the jail hosts is running under VMWARE and the other is runni= ng under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running under = Ubuntu with KVM >>>> >>>> Thank you for your time. >>>> regards >>>> Johan >>>> >>> I did some bisecting and the latest commit that works on FreeBSD 13-S= table is 009a56b2e >>> Then the commit 2e0bee4c7 if_epair: implement fanout and above is sh= owing the symptoms described above. >>> >> Interestingly I cannot reproduce stalls in simple epair setups. >> It would be useful if you could reduce the setup with the problem into= a minimal configuration so we can figure out what other factors are invo= lved. > > If there are clear instructions on how to reproduce, I=E2=80=99m happy = to help experimenting (I=E2=80=99m relying heavily on epair at this point= ). > > @Kristof: Did you try on bare metal or on vms? > Both. Kristof
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B3094CE7-4869-4CF2-853D-F70E84B28914>