Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Mar 2022 01:55:44 +0100
From:      Michael Gmelin <grembo@freebsd.org>
To:        Kristof Provost <kp@freebsd.org>
Cc:        Johan Hendriks <joh.hendriks@gmail.com>, freebsd-net@freebsd.org, ">> \\\\\\\\Patrick M. Hausen\\\\" <hausen@punkt.de>
Subject:   Re: epair and vnet jail loose connection.
Message-ID:  <43AA6B37-6235-4787-A03F-B4C264C75A58@freebsd.org>
In-Reply-To: <41ED1534-5E98-4D46-A562-811E80F82C5F@FreeBSD.org>
References:  <41ED1534-5E98-4D46-A562-811E80F82C5F@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 12. Mar 2022, at 01:21, Kristof Provost <kp@freebsd.org> wrote:
>=20
> =EF=BB=BFOn 11 Mar 2022, at 17:44, Johan Hendriks wrote:
>>> On 09/03/2022 20:55, Johan Hendriks wrote:
>>> The problem:
>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both runnin=
g the same jails just to test the workings.
>>>=20
>>> The jails that are running are a salt master, a haproxy  jail, 2 webserv=
ers, 2 varnish servers, 2 php jails one for php8.0 and one with 8.1. All the=
 jails are connected to bridge0 and all the jails use vnet.
>>>=20
>>> I believe this worked on an older 14-HEAD machine, but i did not do a lo=
t with it back then, and when i started testing again and after updating the=
 OS i noticed that one of the varnish jails lost it's network connection aft=
er running for a few hours. I thought it was just something on HEAD so never=
 really looked at it. But later on when i start using the jails again and te=
sting a test wordpress site i noticed that with a simple load test my haprox=
y jail within one minute looses it's network connection. I see nothing in th=
e logs, on the host and on the jail.
>>> =46rom the jail i can not ping the other jails or the IP adres of the br=
idge. I can however ping the jails own IP adres. =46rom the host i can also n=
ot ping the haproxy jail IP adres. If i start a tcpdump on the epaira interf=
ace from the haproxy jail i do see the packets arrive but not in the jail.
>>>=20
>>> I used ZFS to send all the jails to a 13-STABLE machine and copied over t=
he jail.conf file as well as the pf.conf file and i saw the same behavior.
>>>=20
>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not see thi=
s happening. There i can stress test the machine for 10 minutes without a pr=
oblem but on 14-HEAD and 13-STABLE within a minute the jail's network connec=
tion fails and only a restart of the jail brings it back online to exhibit t=
he same behavior if i start a simple load test which it should handle nicely=
.
>>>=20
>>> One of the jail hosts is running under VMWARE and the other is running u=
nder Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running under Ubuntu w=
ith KVM
>>>=20
>>> Thank you for your time.
>>> regards
>>> Johan
>>>=20
>> I did some bisecting and the latest commit that works on FreeBSD 13-Stabl=
e is 009a56b2e
>> Then the commit 2e0bee4c7  if_epair: implement fanout and above is showin=
g the symptoms described above.
>>=20
> Interestingly I cannot reproduce stalls in simple epair setups.
> It would be useful if you could reduce the setup with the problem into a m=
inimal configuration so we can figure out what other factors are involved.

If there are clear instructions on how to reproduce, I=E2=80=99m happy to he=
lp experimenting (I=E2=80=99m relying heavily on epair at this point).

@Kristof: Did you try on bare metal or on vms?

-m







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43AA6B37-6235-4787-A03F-B4C264C75A58>