From nobody Sun Mar 13 10:49:49 2022 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 717BA1A115AF for ; Sun, 13 Mar 2022 10:49:58 +0000 (UTC) (envelope-from grembo@freebsd.org) Received: from mail.evolve.de (mail.evolve.de [213.239.217.29]) (using TLSv1.3 with cipher TLS_CHACHA20_POLY1305_SHA256 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA512 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail.evolve.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KGbz84xXQz4WY3; Sun, 13 Mar 2022 10:49:56 +0000 (UTC) (envelope-from grembo@freebsd.org) Received: by mail.evolve.de (OpenSMTPD) with ESMTP id aec9b6b0; Sun, 13 Mar 2022 10:49:53 +0000 (UTC) Received: by mail.evolve.de (OpenSMTPD) with ESMTPSA id fdf0f838 (TLSv1.3:AEAD-CHACHA20-POLY1305-SHA256:256:NO); Sun, 13 Mar 2022 10:49:50 +0000 (UTC) Content-Type: multipart/alternative; boundary=Apple-Mail-0DEA3C08-EBF4-4735-82EC-9D04AF405996 Content-Transfer-Encoding: 7bit List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org Mime-Version: 1.0 (1.0) Subject: Re: epair and vnet jail loose connection. From: Michael Gmelin In-Reply-To: Cc: Kristof Provost , freeBSD-net , ">> \\\\\\\\Patrick M. Hausen\\\\" Date: Sun, 13 Mar 2022 11:49:49 +0100 Message-Id: <144A3D43-F9CE-492D-85E6-D47D1A47400F@freebsd.org> References: To: Johan Hendriks X-Mailer: iPhone Mail (19D52) X-Rspamd-Queue-Id: 4KGbz84xXQz4WY3 X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=softfail (mx1.freebsd.org: 213.239.217.29 is neither permitted nor denied by domain of grembo@freebsd.org) smtp.mailfrom=grembo@freebsd.org X-Spamd-Result: default: False [1.21 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; MV_CASE(0.50)[]; R_SPF_SOFTFAIL(0.00)[~all]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:~,2:~]; ASN(0.00)[asn:24940, ipnet:213.239.192.0/18, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; R_PARTS_DIFFER(0.85)[92.6%]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.93)[-0.934]; FREEFALL_USER(0.00)[grembo]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(0.87)[0.871]; NEURAL_HAM_LONG(-0.17)[-0.174]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative]; DMARC_NA(0.00)[freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MIME_HTML_ONLY(0.20)[]; MLMMJ_DEST(0.00)[freebsd-net]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail-0DEA3C08-EBF4-4735-82EC-9D04AF405996 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > On 13. Mar 2022, at 11:27, Johan Hendriks wrote: > =EF=BB=BF >=20 >=20 > Op zo 13 mrt. 2022 01:17 schreef Michael Gmelin : >> I also gave it another go (this time with multiple CPUs assigned to the v= m), still works just fine - so I think we would need more details about the s= etup. >>=20 >> Would it make sense to share our test setups, so Johan can try to reprodu= ce with them? >>=20 >> -m >>=20 >>> On 13. Mar 2022, at 00:48, Kristof Provost wrote: >>> =EF=BB=BF >>> I=E2=80=99m still failing to reproduce. >>>=20 >>> Is pf absolutely required to trigger the issue? Is haproxy (i.e. can you= trigger it with iperf)?=20 >>> Is the bridge strictly required? >>>=20 >>> Kristof >>>=20 >>> On 12 Mar 2022, at 8:18, Johan Hendriks wrote:=20 >>> For me this minimal setup let me see the drop off of the network from th= e haproxy server. >>>=20 >>> 2 jails, one with haproxy, one with nginx which is using the following h= tml file to be served. >>>=20 >>> >>> >>> >>> Page Title >>> >>> >>>=20 >>>

My First Heading

>>>

My first paragraph.

>>>=20 >>> >>> >>>=20 >>> =46rom a remote machine i do a hey -h2 -n 10 -c 10 -z 300s https://wp.t= est.nl >>> Then a ping on the jailhost to the haproxy shows the following >>>=20 >>> [ /] > ping 10.233.185.20 >>> PING 10.233.185.20 (10.233.185.20): 56 data bytes >>> 64 bytes from 10.233.185.20: icmp_seq=3D0 ttl=3D64 time=3D0.054 ms >>> 64 bytes from 10.233.185.20: icmp_seq=3D1 ttl=3D64 time=3D0.050 ms >>> 64 bytes from 10.233.185.20: icmp_seq=3D2 ttl=3D64 time=3D0.041 ms >>> >>> 64 bytes from 10.233.185.20: icmp_seq=3D169 ttl=3D64 time=3D0.050 ms >>> 64 bytes from 10.233.185.20: icmp_seq=3D170 ttl=3D64 time=3D0.154 ms >>> 64 bytes from 10.233.185.20: icmp_seq=3D171 ttl=3D64 time=3D0.054 ms >>> 64 bytes from 10.233.185.20: icmp_seq=3D172 ttl=3D64 time=3D0.039 ms >>> 64 bytes from 10.233.185.20: icmp_seq=3D173 ttl=3D64 time=3D0.160 ms >>> 64 bytes from 10.233.185.20: icmp_seq=3D174 ttl=3D64 time=3D0.045 ms >>> ^C >>> --- 10.233.185.20 ping statistics --- >>> 335 packets transmitted, 175 packets received, 47.8% packet loss >>> round-trip min/avg/max/stddev =3D 0.037/0.070/0.251/0.040 ms >>>=20 >>>=20 >>> ifconfig >>> vtnet0: flags=3D8963 met= ric 0 mtu 1500 >>> options=3D4c00bb >>> ether 56:16:e9:80:5e:41 >>> inet 87.233.191.146 netmask 0xfffffff0 broadcast 87.233.191.159 >>> inet 87.233.191.156 netmask 0xffffffff broadcast 87.233.191.156 >>> inet 87.233.191.155 netmask 0xffffffff broadcast 87.233.191.155 >>> inet 87.233.191.154 netmask 0xffffffff broadcast 87.233.191.154 >>> media: Ethernet autoselect (10Gbase-T ) >>> status: active >>> nd6 options=3D29 >>> vtnet1: flags=3D8863 metric 0 mt= u 1500 >>> options=3D4c07bb >>> ether 56:16:2c:64:32:35 >>> media: Ethernet autoselect (10Gbase-T ) >>> status: active >>> nd6 options=3D29 >>> lo0: flags=3D8049 metric 0 mtu 16384 >>> options=3D680003 >>> inet6 ::1 prefixlen 128 >>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 >>> inet 127.0.0.1 netmask 0xff000000 >>> groups: lo >>> nd6 options=3D21 >>> bridge0: flags=3D8843 metric 0 m= tu 1500 >>> ether 58:9c:fc:10:ff:82 >>> inet 10.233.185.1 netmask 0xffffff00 broadcast 10.233.185.255 >>> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 >>> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 >>> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 >>> member: epair20a flags=3D143 >>> ifmaxaddr 0 port 7 priority 128 path cost 2000 >>> member: epair18a flags=3D143 >>> ifmaxaddr 0 port 15 priority 128 path cost 2000 >>> groups: bridge >>> nd6 options=3D9 >>> bridge1: flags=3D8843 metric 0 m= tu 1500 >>> ether 58:9c:fc:10:d9:1a >>> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 >>> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 >>> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 >>> member: vtnet0 flags=3D143 >>> ifmaxaddr 0 port 1 priority 128 path cost 2000 >>> groups: bridge >>> nd6 options=3D9 >>> pflog0: flags=3D141 metric 0 mtu 33160 >>> groups: pflog >>> epair18a: flags=3D8963 m= etric 0 mtu 1500 >>> description: jail_web01 >>> options=3D8 >>> ether 02:77:ea:19:c7:0a >>> groups: epair >>> media: Ethernet 10Gbase-T (10Gbase-T ) >>> status: active >>> nd6 options=3D29 >>> epair20a: flags=3D8963 m= etric 0 mtu 1500 >>> description: jail_haproxy >>> options=3D8 >>> ether 02:9b:93:8c:59:0a >>> groups: epair >>> media: Ethernet 10Gbase-T (10Gbase-T ) >>> status: active >>> nd6 options=3D29 >>>=20 >>> jail.conf >>>=20 >>> # Global settings applied to all jails. >>> $domain =3D "test.nl"; >>>=20 >>> exec.start =3D "/bin/sh /etc/rc"; >>> exec.stop =3D "/bin/sh /etc/rc.shutdown"; >>> exec.clean; >>>=20 >>> mount.fstab =3D "/storage/jails/$name.fstab"; >>>=20 >>> exec.system_user =3D "root"; >>> exec.jail_user =3D "root"; >>> mount.devfs; >>> sysvshm=3D"new"; >>> sysvsem=3D"new"; >>> allow.raw_sockets; >>> allow.set_hostname =3D 0; >>> allow.sysvipc; >>> enforce_statfs =3D "2"; >>> devfs_ruleset =3D "11"; >>>=20 >>> path =3D "/storage/jails/${name}"; >>> host.hostname =3D "${name}.${domain}"; >>>=20 >>>=20 >>> # Networking >>> vnet; >>> vnet.interface =3D "vnet0"; >>>=20 >>> # Commands to run on host before jail is created >>> exec.prestart =3D "ifconfig epair${ip} create up description jail_${n= ame}"; >>> exec.prestart +=3D "ifconfig epair${ip}a up"; >>> exec.prestart +=3D "ifconfig bridge0 addm epair${ip}a up"; >>> exec.created =3D "ifconfig epair${ip}b name vnet0"; >>>=20 >>> # Commands to run in jail after it is created >>> exec.start +=3D "/bin/sh /etc/rc"; >>>=20 >>> # commands to run in jail when jail is stopped >>> exec.stop =3D "/bin/sh /etc/rc.shutdown"; >>>=20 >>> # Commands to run on host when jail is stopped >>> exec.poststop =3D "ifconfig bridge0 deletem epair${ip}a"; >>> exec.poststop +=3D "ifconfig epair${ip}a destroy"; >>> persist; >>>=20 >>> web01 { >>> $ip =3D 18; >>> } >>>=20 >>> haproxy { >>> $ip =3D 20; >>> mount.fstab =3D ""; >>> path =3D "/storage/jails/${name}"; >>> } >>>=20 >>> pf.conf >>>=20 >>> ####################################################################### >>> ext_if=3D"vtnet0" >>> table persist >>> table persist >>> table persist file "/usr/local/etc/pf/ssh-trusted" >>> table persist file "/usr/local/etc/pf/custom-block" >>> table { 10.233.185.0/24, 192.168.10.0/24 } >>>=20 >>> icmp_types =3D "echoreq" >>> junk_ports=3D"{ 135,137,138,139,445,68,67,3222,17500 }" >>>=20 >>> # Log interface >>> set loginterface $ext_if >>>=20 >>> # Set limits >>> set limit { states 40000, frags 20000, src-nodes 20000 } >>>=20 >>> scrub on $ext_if all fragment reassemble no-df random-id >>>=20 >>> # ---- Nat jails to the web >>> binat on $ext_if from 10.233.185.15/32 to !10.233.185.0/24 -> 87.233.191= .156/32 # saltmaste >>> binat on $ext_if from 10.233.185.20/32 to !10.233.185.0/24 -> 87.233.191= .155/32 # haproxy >>> binat on $ext_if from 10.233.185.22/32 to !10.233.185.0/24 -> 87.233.191= .154/32 # web-comb >>>=20 >>> nat on $ext_if from to any -> ($ext_if:0) >>>=20 >>> # ---- First rule obligatory "Pass all on loopback" >>> pass quick on lo0 all >>> pass quick on bridge0 all >>> pass quick on bridge1 all >>>=20 >>> # ---- Block TOR exit addresses >>> block quick proto { tcp, udp } from to $ext_if >>>=20 >>> # ---- Second rule "Block all in and pass all out" >>> block in log all >>> pass out all keep state >>>=20 >>> # IPv6 pass in/out all IPv6 ICMP traffic >>> pass in quick proto icmp6 all >>>=20 >>> # Pass all lo0 >>> set skip on lo0 >>>=20 >>> ############### FIREWALL ###############################################= >>> # ---- Block custom ip's and logs >>> block quick proto { tcp, udp } from to $ext_if >>>=20 >>> # ---- Jail poorten >>> pass in quick on { $ext_if } proto tcp from any to 10.233.185.22 port { s= mtp 80 443 993 995 1956 } keep state >>> pass in quick on { $ext_if } proto tcp from any to 10.233.185.20 port { s= mtp 80 443 993 995 1956 } keep state >>> pass in quick on { $ext_if } proto tcp from any to 10.233.185.15 port { 4= 505 4506 } keep state >>>=20 >>> # ---- Allow ICMP >>> pass in inet proto icmp all icmp-type $icmp_types keep state >>> pass out inet proto icmp all icmp-type $icmp_types keep state >>>=20 >>> pass in quick on $ext_if inet proto tcp from any to $ext_if port { 80, 4= 43 } flags S/SA keep state >>> pass in quick on $ext_if inet proto tcp from to $ext_if po= rt { 4505 4506 } flags S/SA keep state >>> block log quick from >>> pass quick proto tcp from to $ext_if port ssh flags S/SA k= eep state >>>=20 >>> This is as minimal i can get it. >>>=20 >>> Hope this helps. >>> regards, >>> Johan Hendriks >>>=20 >>>=20 >>> Op za 12 mrt. 2022 om 02:10 schreef Kristof Provost : >>>> On 11 Mar 2022, at 18:55, Michael Gmelin wrote: >>>> >> On 12. Mar 2022, at 01:21, Kristof Provost wrote: >>>> >> >>>> >> =EF=BB=BFOn 11 Mar 2022, at 17:44, Johan Hendriks wrote: >>>> >>>> On 09/03/2022 20:55, Johan Hendriks wrote: >>>> >>>> The problem: >>>> >>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both r= unning the same jails just to test the workings. >>>> >>>> >>>> >>>> The jails that are running are a salt master, a haproxy jail, 2 w= ebservers, 2 varnish servers, 2 php jails one for php8.0 and one with 8.1. A= ll the jails are connected to bridge0 and all the jails use vnet. >>>> >>>> >>>> >>>> I believe this worked on an older 14-HEAD machine, but i did not d= o a lot with it back then, and when i started testing again and after updati= ng the OS i noticed that one of the varnish jails lost it's network connecti= on after running for a few hours. I thought it was just something on HEAD so= never really looked at it. But later on when i start using the jails again a= nd testing a test wordpress site i noticed that with a simple load test my h= aproxy jail within one minute looses it's network connection. I see nothing i= n the logs, on the host and on the jail. >>>> >>>> =46rom the jail i can not ping the other jails or the IP adres of t= he bridge. I can however ping the jails own IP adres. =46rom the host i can a= lso not ping the haproxy jail IP adres. If i start a tcpdump on the epaira i= nterface from the haproxy jail i do see the packets arrive but not in the ja= il. >>>> >>>> >>>> >>>> I used ZFS to send all the jails to a 13-STABLE machine and copied= over the jail.conf file as well as the pf.conf file and i saw the same beha= vior. >>>> >>>> >>>> >>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not s= ee this happening. There i can stress test the machine for 10 minutes withou= t a problem but on 14-HEAD and 13-STABLE within a minute the jail's network c= onnection fails and only a restart of the jail brings it back online to exhi= bit the same behavior if i start a simple load test which it should handle n= icely. >>>> >>>> >>>> >>>> One of the jail hosts is running under VMWARE and the other is run= ning under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running under U= buntu with KVM >>>> >>>> >>>> >>>> Thank you for your time. >>>> >>>> regards >>>> >>>> Johan >>>> >>>> >>>> >>> I did some bisecting and the latest commit that works on FreeBSD 13= -Stable is 009a56b2e >>>> >>> Then the commit 2e0bee4c7 if_epair: implement fanout and above is s= howing the symptoms described above. >>>> >>> >>>> >> Interestingly I cannot reproduce stalls in simple epair setups. >>>> >> It would be useful if you could reduce the setup with the problem in= to a minimal configuration so we can figure out what other factors are invol= ved. >>>> > >>>> > If there are clear instructions on how to reproduce, I=E2=80=99m happ= y to help experimenting (I=E2=80=99m relying heavily on epair at this point)= . >>>> > >>>> > @Kristof: Did you try on bare metal or on vms? >>>> > >>>> Both. >>>>=20 >>>> Kristof > I also did do a new install, this time based on 13.1-PRERELEASE. > Copyd my haproxy en web01 jail to this machine and have the same problem.=20= >=20 > Could it be a sysctl i use? or boot/loader.conf setting. >=20 > this is my /boot/loader.conf > # -- sysinstall generated deltas -- # >=20 > autoboot_delay=3D"2" #optional >=20 > cryptodev_load=3D"YES" >=20 > vbe_max_resolution=3D1024x768 >=20 > # disable hyperthreading > machdep.hyperthreading_allowed=3D0 >=20 > # filemon > filemon_load=3D"YES" >=20 > # use gpt ids instead of gptids or disks idents > kern.geom.label.disk_ident.enable=3D"0" > kern.geom.label.gpt.enable=3D"1" > kern.geom.label.gptid.enable=3D"0" >=20 > # ZFS > zfs_load=3D"YES" >=20 > My /etc/sysctl.conf >=20 > # $FreeBSD$ > # > # This file is read when going to multi-user and its contents piped thru > # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details.= > # > kern.timecounter.hardware=3DHPET > # accept queue > kern.ipc.soacceptqueue=3D4096 >=20 > # PF vnet jail > net.link.bridge.pfil_member=3D0 > net.link.bridge.pfil_bridge=3D0 > net.inet.ip.forwarding=3D1 # (default 0) > net.inet.tcp.tso=3D0 # (default 1) > vfs.zfs.min_auto_ashift=3D12 >=20 > I f you want i can give you full root access on this machine.=20 >=20 > I do use a machine outside of the host machine to do the hey command. The h= ost file points to the alias which is binat for the haproxy jail. >=20 > Thank you all for your time on this! >=20 > regards > Johan Hendriks >=20 Hi Johan, Two questions from one of my previous emails: 1. How is web01 configured (I created a full jail for it like haproxy, as it= was unclear to me) 2. > devfs_ruleset =3D "11"; What is in devfs_ruleset 11? (it's not a standard one), I used "4" in my tests. Root access might help as well, if we continue to not be able to reproduce. Cheers Michael --Apple-Mail-0DEA3C08-EBF4-4735-82EC-9D04AF405996 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable


On 13. M= ar 2022, at 11:27, Johan Hendriks <joh.hendriks@gmail.com> wrote:
<= br>
=EF=BB=BF


Op zo 13 mrt. 2022 01:17 schreef Michael= Gmelin <grembo@f= reebsd.org>:
I also gave it another go (this tim= e with multiple CPUs assigned to the vm), still works just fine - so I think= we would need more details about the setup.

Would it make sense to share our test setups, so Johan can= try to reproduce with them?

-m

On 13. Mar 2022, at= 00:48, Kristof Provost <kp@freebsd.org> wrote:

=EF=BB=BF

I=E2=80=99m still failing to reproduce.

Is pf absolutely required to trigger the issue? Is haproxy (= i.e. can you trigger it with iperf)?
Is the bridge strictly required?

Kristof

On 12 Mar 2022, at 8:18, Johan Hendriks wrote:

For me this minimal setup let me see the drop off of the ne= twork from the haproxy server.

2 jails, one with haproxy, one with nginx which is using the following html f= ile to be served.

<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>

<h1>My First Heading</h1>
<p>My first paragraph.</p>

</body>
</html>

=46rom a remote machine i do a  hey -h2 -n 10 -c 10 -z 300s https://wp.tes= t.nl
Then a ping on the jailhost to the haproxy shows the following

[ /] > ping 10.233.185.20
PING 10.233.185.20 (10.233.185.20): 56 data bytes
64 bytes from 10.233.185.20: icmp_seq=3D0 ttl=3D64 time=3D0.054 ms
64 bytes from 10.233.185.20: icmp_seq=3D1 ttl=3D64 time=3D0.050 ms
64 bytes from 10.233.185.20: icmp_seq=3D2 ttl=3D64 time=3D0.041 ms
<SNIP>
64 bytes from 10.233.185.20: icmp_seq=3D169 ttl=3D64 time=3D0.050 ms
64 bytes from 10.233.185.20: icmp_seq=3D170 ttl=3D64 time=3D0.154 ms
64 bytes from 10.233.185.20: icmp_seq=3D171 ttl=3D64 time=3D0.054 ms
64 bytes from 10.233.185.20: icmp_seq=3D172 ttl=3D64 time=3D0.039 ms
64 bytes from 10.233.185.20: icmp_seq=3D173 ttl=3D64 time=3D0.160 ms
64 bytes from 10.233.185.20: icmp_seq=3D174 ttl=3D64 time=3D0.045 ms
^C
--- 10.233.185.20 ping statistics ---
335 packets transmitted, 175 packets received, 47.8% packet loss
round-trip min/avg/max/stddev =3D 0.037/0.070/0.251/0.040 ms


ifconfig
vtnet0: flags=3D8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> m= etric 0 mtu 1500
options=3D4c00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWC= SUM,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
ether 56:16:e9:80:5e:41
inet 87.233.191.146 netmask 0xfffffff0 broadcast 87.233.191.159
inet 87.233.191.156 netmask 0xffffffff broadcast 87.233.191.156
inet 87.233.191.155 netmask 0xffffffff broadcast 87.233.191.155
inet 87.233.191.154 netmask 0xffffffff broadcast 87.233.191.154
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vtnet1: flags=3D8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 m= tu 1500
options=3D4c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWC= SUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
ether 56:16:2c:64:32:35
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=3D680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff000000
groups: lo
nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
bridge0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0= mtu 1500
ether 58:9c:fc:10:ff:82
inet 10.233.185.1 netmask 0xffffff00 broadcast 10.233.185.255
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: epair20a flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
       ifmaxaddr 0 port 7 priority 128 path cost 2000 member: epair18a flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
       ifmaxaddr 0 port 15 priority 128 path cost 2000 groups: bridge
nd6 options=3D9<PERFORMNUD,IFDISABLED>
bridge1: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0= mtu 1500
ether 58:9c:fc:10:d9:1a
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: vtnet0 flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
       ifmaxaddr 0 port 1 priority 128 path cost 2000 groups: bridge
nd6 options=3D9<PERFORMNUD,IFDISABLED>
pflog0: flags=3D141<UP,RUNNING,PROMISC> metric 0 mtu 33160
groups: pflog
epair18a: flags=3D8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>= metric 0 mtu 1500
description: jail_web01
options=3D8<VLAN_MTU>
ether 02:77:ea:19:c7:0a
groups: epair
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
epair20a: flags=3D8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>= metric 0 mtu 1500
description: jail_haproxy
options=3D8<VLAN_MTU>
ether 02:9b:93:8c:59:0a
groups: epair
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

jail.conf

# Global settings applied to all jails.
$domain =3D "test.nl";

exec.start =3D "/bin/sh /etc/rc";
exec.stop =3D "/bin/sh /etc/rc.shutdown";
exec.clean;

mount.fstab =3D "/storage/jails/$name.fstab";

exec.system_user  =3D "root";
exec.jail_user    =3D "root";
mount.devfs;
sysvshm=3D"new";
sysvsem=3D"new";
allow.raw_sockets;
allow.set_hostname =3D 0;
allow.sysvipc;
enforce_statfs =3D "2";
devfs_ruleset     =3D "11";

path =3D "/storage/jails/${name}";
host.hostname =3D "${name}.${domain}";


# Networking
vnet;
vnet.interface    =3D "vnet0";

  # Commands to run on host before jail is created
  exec.prestart  =3D "ifconfig epair${ip} create up description ja= il_${name}";
  exec.prestart  +=3D "ifconfig epair${ip}a up";
  exec.prestart  +=3D "ifconfig bridge0 addm epair${ip}a up";
  exec.created   =3D "ifconfig epair${ip}b name vnet0";

  # Commands to run in jail after it is created
  exec.start  +=3D "/bin/sh /etc/rc";

  # commands to run in jail when jail is stopped
  exec.stop  =3D "/bin/sh /etc/rc.shutdown";

  # Commands to run on host when jail is stopped
  exec.poststop  =3D "ifconfig bridge0 deletem epair${ip}a";
  exec.poststop  +=3D "ifconfig epair${ip}a destroy";
  persist;

web01 {
    $ip =3D 18;
}

haproxy {
    $ip =3D 20;
    mount.fstab =3D "";
    path =3D "/storage/jails/${name}";
}

pf.conf

#######################################################################
ext_if=3D"vtnet0"
table <bruteforcers> persist
table <torlist> persist
table <ssh-trusted> persist file "/usr/local/etc/pf/ssh-trusted"
table <custom-block> persist file "/usr/local/etc/pf/custom-block"
= table <jailnetworks> { 10.233.185.0/24, 192.168.10.0/24 }

icmp_types =3D "echoreq"
junk_ports=3D"{ 135,137,138,139,445,68,67,3222,17500 }"

# Log interface
set loginterface $ext_if

# Set limits
set limit { states 40000, frags 20000, src-nodes 20000 }

scrub on $ext_if all fragment reassemble no-df random-id

# ---- Nat jails to the web
binat on $ext_if from 10.233.185.15/32 to !10.233.185.0/24 -> 87.233.191.1= 56/32 # saltmaste
binat on $ext_if from 10.233.185.20/32 to !10.233.185.0/24 -> 87.233.191.1= 55/32 # haproxy
binat on $ext_if from 10.233.185.22/32 to !10.233.185.0/24 -> 87.233.191.1= 54/32 # web-comb

nat on $ext_if from <jailnetworks> to any -> ($ext_if:0)

# ---- First rule obligatory "Pass all on loopback"
pass quick on lo0 all
pass quick on bridge0 all
pass quick on bridge1 all

# ---- Block TOR exit addresses
block quick proto { tcp, udp } from <torlist> to $ext_if

# ---- Second rule "Block all in and pass all out"
block in log all
pass out all keep state

# IPv6 pass in/out all IPv6 ICMP traffic
pass in quick proto icmp6 all

# Pass all lo0
set skip on lo0

############### FIREWALL ###############################################
= # ---- Block custom ip's and logs
block quick proto { tcp, udp } from <custom-block> to $ext_if

# ---- Jail poorten
pass in quick on { $ext_if } proto tcp from any to 10.233.185.22 port { smtp= 80 443 993 995 1956 } keep state
pass in quick on { $ext_if } proto tcp from any to 10.233.185.20 port { smtp= 80 443 993 995 1956 } keep state
pass in quick on { $ext_if } proto tcp from any to 10.233.185.15 port { 4505= 4506 } keep state

# ---- Allow ICMP
pass in inet proto icmp all icmp-type $icmp_types keep state
pass out inet proto icmp all icmp-type $icmp_types keep state

pass in quick on $ext_if inet proto tcp from any to $ext_if port { 80, 443 }= flags S/SA keep state
pass in quick on $ext_if inet proto tcp from <ssh-trusted> to $ext_if p= ort { 4505 4506 } flags S/SA keep state
block log quick from <bruteforcers>
pass quick proto tcp from <ssh-trusted> to $ext_if port ssh flags S/SA= keep state

This is as minimal i can get it.

Hope this helps.
regards,
Johan Hendriks


Op za 12 mrt. 2022 om 02:10 schreef Kr= istof Provost <kp@freebsd.org>:
On 11 Mar 2022, at 18:55, M= ichael Gmelin wrote:
>> On 12. Mar 2022, at 01:21, Kristof Provost <kp@freebsd.org> wr= ote:
>>
>> =EF=BB=BFOn 11 Mar 2022, at 17:44, Johan Hendriks wrote:
>>>> On 09/03/2022 20:55, Johan Hendriks wrote:
>>>> The problem:
>>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine= , both running the same jails just to test the workings.
>>>>
>>>> The jails that are running are a salt master, a haproxy&nbs= p; jail, 2 webservers, 2 varnish servers, 2 php jails one for php8.0 and one= with 8.1. All the jails are connected to bridge0 and all the jails use vnet= .
>>>>
>>>> I believe this worked on an older 14-HEAD machine, but i di= d not do a lot with it back then, and when i started testing again and after= updating the OS i noticed that one of the varnish jails lost it's network c= onnection after running for a few hours. I thought it was just something on H= EAD so never really looked at it. But later on when i start using the jails a= gain and testing a test wordpress site i noticed that with a simple load tes= t my haproxy jail within one minute looses it's network connection. I see no= thing in the logs, on the host and on the jail.
>>>> =46rom the jail i can not ping the other jails or the IP ad= res of the bridge. I can however ping the jails own IP adres. =46rom the hos= t i can also not ping the haproxy jail IP adres. If i start a tcpdump on the= epaira interface from the haproxy jail i do see the packets arrive but not i= n the jail.
>>>>
>>>> I used ZFS to send all the jails to a 13-STABLE machine and= copied over the jail.conf file as well as the pf.conf file and i saw the sa= me behavior.
>>>>
>>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i d= o not see this happening. There i can stress test the machine for 10 minutes= without a problem but on 14-HEAD and 13-STABLE within a minute the jail's n= etwork connection fails and only a restart of the jail brings it back online= to exhibit the same behavior if i start a simple load test which it should h= andle nicely.
>>>>
>>>> One of the jail hosts is running under VMWARE and the other= is running under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running u= nder Ubuntu with KVM
>>>>
>>>> Thank you for your time.
>>>> regards
>>>> Johan
>>>>
>>> I did some bisecting and the latest commit that works on FreeBS= D 13-Stable is 009a56b2e
>>> Then the commit 2e0bee4c7  if_epair: implement fanout and a= bove is showing the symptoms described above.
>>>
>> Interestingly I cannot reproduce stalls in simple epair setups.
= >> It would be useful if you could reduce the setup with the problem i= nto a minimal configuration so we can figure out what other factors are invo= lved.
>
> If there are clear instructions on how to reproduce, I=E2=80=99m happy t= o help experimenting (I=E2=80=99m relying heavily on epair at this point). >
> @Kristof: Did you try on bare metal or on vms?
>
Both.

Kristof
I also did do a new install, this time based on 13.1-P= RERELEASE.
Copyd my haproxy en web01 jail to this machine and have the sa= me problem. 

Could it be a sysctl i use? or boot/loader.conf set= ting.

this is my /boot/loader.conf
# -- sysinstall generated delta= s -- #

autoboot_delay=3D"2"  #optional

cryptodev_load=3D"= YES"

vbe_max_resolution=3D1024x768

# disable hyperthreadingmachdep.hyperthreading_allowed=3D0

# filemon
filemon_load=3D"YES"=

# use gpt ids instead of gptids or disks idents
kern.geom.label.d= isk_ident.enable=3D"0"
kern.geom.label.gpt.enable=3D"1"
kern.geom.labe= l.gptid.enable=3D"0"

# ZFS
zfs_load=3D"YES"

My /etc/sysctl.= conf

# $FreeBSD$
#
#  This file is read when going to mult= i-user and its contents piped thru
#  ``sysctl'' to adjust kernel va= lues.  ``man 5 sysctl.conf'' for details.
#
kern.timecounter.hard= ware=3DHPET
# accept queue
kern.ipc.soacceptqueue=3D4096

# PF v= net jail
net.link.bridge.pfil_member=3D0
net.link.bridge.pfil_bridge=3D= 0
net.inet.ip.forwarding=3D1             &n= bsp;     # (default 0)
net.inet.tcp.tso=3D0  # (default 1)=
vfs.zfs.min_auto_ashift=3D12

I f you want i can give you full roo= t access on this machine. 

I do use a machine outside of the hos= t machine to do the hey command. The host file points to the alias which is b= inat for the haproxy jail.

Thank you all for your time on this!
regards
Johan Hendriks


Hi Johan,

Two questions from one of= my previous emails:

1. How is web01 configured (I c= reated a full jail for it like haproxy, as it was unclear to me)
2= .

devfs_rul= eset     =3D "11";

What is in devfs_ruleset= 11? (it's not a standard one), I used "4" in
my tes= ts.

Root access might help as well, if we co= ntinue to not be able to reproduce.

Cheers
Michael









 
= --Apple-Mail-0DEA3C08-EBF4-4735-82EC-9D04AF405996--