Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 06 Mar 2023 22:51:19 +0000
From:      bugzilla-noreply@freebsd.org
To:        jail@FreeBSD.org
Subject:   [Bug 240106] VNET issue with ARP and routing sockets in jails
Message-ID:  <bug-240106-29815-SYuNxYuoyr@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-240106-29815@https.bugs.freebsd.org/bugzilla/>
References:  <bug-240106-29815@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D240106

kvs <overwatch@lab.kyngin.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |overwatch@lab.kyngin.net

--- Comment #26 from kvs <overwatch@lab.kyngin.net> ---
Hello Everyone!

I believe I have hit the same bug, though I believe my issue is specifically
related to lagg/lacp.  I can confirm this problem affects tap as well as ep=
air
interfaces on a bridge when attempting to send over a vlan interface that h=
as a
lagg parent.


System Description: FreeBSD 13.1 w/ Chelsio T6225-SO-CR NIC, identified by =
cc0
/ cc1 (confirmed up and operational), host25 is the system name.  Network is
10.20.20.0/24, gateway is 10.20.20.254 (mac: 02:11:22:33:44:55), host is
assigned 10.20.20.5, epair0 is assigned to jail-10-20-20-6 (with matching I=
P of
10.20.20.6 on epair0b).  Switch is set to accept tagged frames only for vlan
2020.  All mtu's 1500.

When adding a vlan interface child of cc0 to the bridge, I do not have any
trouble passing data over the lagg.

host25# ifconfig cc0.2020 create up
host25# ifconfig bridge2020 create up
host25# ifconfig bridge2020 addm cc0.2020
host25# ifconfig bridge2020 addm epair0a
host25# ifconfig bridge2020 inet 10.20.20.25/24

(pings from host -> gateway works fine)
host25# ping 10.20.20.254
success!

(pings from jail -> gateway also work)
host25# jexec jail-10-20-20-6 sh
jail-10-20-20-6# ping 10.20.20.254
success!

(I now reset bridge2020 to use a lagg interface.)
host25# ifconfig bridge2020 destroy
host25# ifconfig cc0.2020 destroy

host25# ifconfig lagg0 create laggproto lacp laggport cc0 laggport cc1 up
host25# ifconfig lagg0.2020 create up
host25# ifconfig bridge2020 create up
host25# ifconfig bridge2020 addm lagg0.2020 addm epair0a
host25# ifconfig bridge2020 inet 10.20.20.25/24

(pings from host -> gateway work fine)
host25# ping 10.20.20.254
success!

(pings from jail -> gateway timeout)
host25# jexec jail-10-20-20-6 sh
jail-10-20-20-6# ping 10.20.20.254
ping: sendto: Host is down


(arp cache from jail appears to not include gateway mac)
jail-10-20-20-6# arp -an
? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet]
? (10.20.20.254) at (incomplete) on epair0b expired [ethernet]

(I assign mac statically.)
jail-10-20-20-6# arp -s 10.20.20.254 02:11:22:33:44:55
jail-10-20-20-6# arp -an
? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet]
? (10.20.20.254) at 02:11:22:33:44:55 on epair0b permanent [ethernet]

(attempt ping again after static arp assignment)
jail-10-20-20-6# ping 10.20.20.254
success!

What comes next is a reasonably big presumption on my part, so hopefully
someone more educated on the topic kindly corrects me where I'm wrong.  See=
ing
that the vlan interface of cc0.2020 works in the bridge when lagg0.2020 is
removed/destroyed. I believe it's possible that the issue is related to arp
responses being sent down one of the two lagg members and the host OS not b=
eing
aware of that.  Although the reply does come inbound on one of the host OS
interfaces, it doesn't propagate that down across the epair / tap.  The VM/=
Jail
then never sees the arp reply, and keeps the arp as "(incomplete)" in it's
cache.  When using a single interface, or a lagg with only a single interfa=
ce
active, arp appears to work as expected.

To help observe this, I did the following:

1) From host25, I watched epair0a, cc0, and cc1 using
host25# tcpdump -e -vvv -XX -i [interface]

2) inside jail-10-20-20-6, I attempted to ping the gateway to generate the =
arp
traffic:
ping -c 1 -t 1 -q 10.20.20.254
PING 10.20.20.254 (10.20.20.254): 56 data bytes

--- 10.20.20.254 ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss



3) Results follow:
# tcpdump -e -vvv -XX -i epair0a
tcpdump: listening on epair0a, link-type EN10MB (Ethernet), capture size 26=
2144
bytes
01:43:54.768801 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP
(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has
10.20.20.254 tell 10.20.20.6, length 28
                0x0000:  ffff ffff ffff 0207 f080 de0b 0806 0001=20
................
                0x0010:  0800 0604 0001 0207 f080 de0b 0a14 1406=20
................
                0x0020:  0000 0000 0000 0a14 14fe                 ..........
01:43:54.768936 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP
(0x0806), length 56: Ethernet (len 6), IPv4 (len 4), Request who-has
10.20.20.254 tell 10.20.20.6, length 42
                0x0000:  ffff ffff ffff 0207 f080 de0b 0806 0001=20
................
                0x0010:  0800 0604 0001 0207 f080 de0b 0a14 1406=20
................
                0x0020:  0000 0000 0000 0a14 14fe 0000 0000 0000=20
................
                0x0030:  0000 0000 0000 0000                      ........
01:43:54.768969 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP
(0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has
10.20.20.254 tell 10.20.20.6, length 46
                0x0000:  ffff ffff ffff 0207 f080 de0b 0806 0001=20
................
                0x0010:  0800 0604 0001 0207 f080 de0b 0a14 1406=20
................
                0x0020:  0000 0000 0000 0a14 14fe 0000 0000 0000=20
................
                0x0030:  0000 0000 0000 0000 0000 0000            .........=
...


# tcpdump -e -vvv -XX -i cc0
tcpdump: listening on cc0, link-type EN10MB (Ethernet), capture size 262144
bytes
01:43:54.768822 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.=
1Q
(0x8100), length 46: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 =
(len
4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28
        0x0000:  ffff ffff ffff 0207 f080 de0b 8100 07e4  ................
        0x0010:  0806 0001 0800 0604 0001 0207 f080 de0b  ................
        0x0020:  0a14 1406 0000 0000 0000 0a14 14fe       ..............
01:43:54.769126 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui
Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A=
RP,
Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 =
(oui
Unknown), length 46
        0x0000:  0207 f080 de0b 0211 2233 4455 8100 07e4  ........"3DU....
        0x0010:  0806 0001 0800 0604 0002 0211 2233 4455  ............"3DU
        0x0020:  0a14 14fe 0207 f080 de0b 0a14 1406 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
01:43:54.769171 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui
Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A=
RP,
Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 =
(oui
Unknown), length 46
        0x0000:  0207 f080 de0b 0211 2233 4455 8100 07e4  ........"3DU....
        0x0010:  0806 0001 0800 0604 0002 0211 2233 4455  ............"3DU
        0x0020:  0a14 14fe 0207 f080 de0b 0a14 1406 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
01:43:54.769221 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui
Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A=
RP,
Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 =
(oui
Unknown), length 46
        0x0000:  0207 f080 de0b 0211 2233 4455 8100 07e4  ........"3DU....
        0x0010:  0806 0001 0800 0604 0002 0211 2233 4455  ............"3DU
        0x0020:  0a14 14fe 0207 f080 de0b 0a14 1406 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................



# tcpdump -e -vvv -XX -i cc1
tcpdump: listening on cc1, link-type EN10MB (Ethernet), capture size 262144
bytes
01:43:54.768876 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.=
1Q
(0x8100), length 60: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 =
(len
4), Request who-has 10.20.20.254 tell 10.20.20.6, length 42
        0x0000:  ffff ffff ffff 0207 f080 de0b 8100 07e4  ................
        0x0010:  0806 0001 0800 0604 0001 0207 f080 de0b  ................
        0x0020:  0a14 1406 0000 0000 0000 0a14 14fe 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000            ............
01:43:54.768965 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.=
1Q
(0x8100), length 64: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 =
(len
4), Request who-has 10.20.20.254 tell 10.20.20.6, length 46
        0x0000:  ffff ffff ffff 0207 f080 de0b 8100 07e4  ................
        0x0010:  0806 0001 0800 0604 0001 0207 f080 de0b  ................
        0x0020:  0a14 1406 0000 0000 0000 0a14 14fe 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................



Apparently 1 arp request is sent over cc0, and 2 over cc1, all 3 replies co=
me
back over cc0.  None of them appear to enter epair0a.  I've not had any luck
changing lagg hashes at this stage to try to force requests down one of the=
 two
lagg members, so instead I downed one of the interfaces in the lagg.

(bridge2020 is still up with epair0a and lagg0.2020 (lagg0 contains cc0+cc1
both up))

jail-10-20-20-6# ping 10.20.20.254
ping: sendto: Host is down

host25# ifconfig cc1 down

(confirm arp cache is empty in jail)
jail-10-20-20-6# arp -da
jail-10-20-20-6# ping 10.20.20.254
success!


(using tcpdump, epair0a now sees the arp replies as well (I excluded the
tcpdump for cc0 here because it's largely identical))
# tcpdump -e -vvv -XX -i epair0a
15:23:10.623560 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP
(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has
10.20.20.254 tell 10.20.20.6, length 28
        0x0000:  0001 0800 0604 0001 0207 f080 de0b 0a14  ................
        0x0010:  1406 0000 0000 0000 0a14 14fe            ............
15:23:10.623916 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui
Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4),
Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46
        0x0000:  0001 0800 0604 0002 0211 2233 4455 0a14  .........."3DU..
        0x0010:  14fe 0207 f080 de0b 0a14 1406 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:23:10.623924 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui
Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4),
Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46
        0x0000:  0001 0800 0604 0002 0211 2233 4455 0a14  .........."3DU..
        0x0010:  14fe 0207 f080 de0b 0a14 1406 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:23:10.623926 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui
Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4),
Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46
        0x0000:  0001 0800 0604 0002 0211 2233 4455 0a14  .........."3DU..
        0x0010:  14fe 0207 f080 de0b 0a14 1406 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:23:10.623943 02:07:f0:80:de:0b (oui Unknown) > 02:11:22:33:44:55 (oui
Unknown), ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 56841,
offset 0, flags [none], proto ICMP (1), length 84)
        10.20.20.6 > 10.20.20.254: ICMP echo request, id 22927, seq 0, leng=
th
64
        0x0000:  4500 0054 de09 0000 4001 5f74 0a14 1406  E..T....@._t....
        0x0010:  0a14 14fe 0800 8750 598f 0000 0006 2ec0  .......PY.......
        0x0020:  15c1 e795 0809 0a0b 0c0d 0e0f 1011 1213  ................
        0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
        0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
        0x0050:  3435 3637                                4567
15:23:10.624147 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui
Unknown), ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 54016,
offset 0, flags [none], proto ICMP (1), length 84)
        10.20.20.254 > 10.20.20.6: ICMP echo reply, id 22927, seq 0, length=
 64
        0x0000:  4500 0054 d300 0000 4001 6a7d 0a14 14fe  E..T....@.j}....
        0x0010:  0a14 1406 0000 8f50 598f 0000 0006 2ec0  .......PY.......
        0x0020:  15c1 e795 0809 0a0b 0c0d 0e0f 1011 1213  ................
        0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
        0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
        0x0050:  3435 3637                                4567


(arp cache seems valid as well)
jail-10-20-20-6# arp -na
? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet]
? (10.20.20.254) at 02:11:22:33:44:55 on epair0b expires in 1085 seconds
[ethernet]





Additional thoughts:
1) With lagg0, cc0, and cc1 up, I created a second jail on host25 using
10.20.20.7 (epair1).  I add epair1a to bridge2020 (now including epair0a,
epair1a and lagg0.2020).

When I attempt to ping from jail-10-20-20-6 to .254 I get a timeout as
previously experienced.

Pinging from .6 to .7 appears to work without any trouble, if lagg0 has any
cc0/1 members up or down.  This was expected, as packets should never trave=
rse
lagg0.2020, but I did want to test/confirm.

2) I did run some ping tests with untagged lagg0 in the bridge, and it does
appear it's working without trouble.  I removed lagg0.2020 from bridge2020,
then added lagg0 to bridge2020, and set the switch ports as untagged in the
switch.  The packets appear to move without trouble even with both cc0+cc1 =
up.=20
I need to further test this to be conclusive, but this felt less important =
to
perform at this time as it doesn't solve the requirement I need of tagged
ports.

3) I have a few bhyve vm's that I've added as tests, tap0, tap1, etc to the
bridge2020.  The results seem to be largely consistent with jails.  You cou=
ld
replace jail-10-20-20-6, with vm-10-20-20-11 (tested freebsd / openbsd /
windows) for instance, and these same results appear.  Packets fail when
originating from tap/vnet and traversing lagg0.2020.

(again, lagg0/lacp is up, includes cc0+cc1, bridge2020 includes lagg0.2020,
tap0, and epair0a devices)
host25# ping 10.20.20.254
success!

vm-10-20-20-11# arp -da
(attempt traverse lagg0.2020)
vm-10-20-20-11# ping 10.20.20.254
ping: sendto: Host is down

(try tap0 -> epair0)
vm-10-20-20-11# ping 10.20.20.6
success!

(try tests again with lagg0 member cc1 down)
host25# cc1 down

(tap0 -> lagg0.2020 -> 10.20.20.254)
vm-10-20-20-11# ping 10.20.20.254
success!

(again tap0 -> epair0, works as expected)
vm-10-20-20-11# ping 10.20.20.6
success!

(turn cc1 back up, wait about 10 seconds for both laggports to be distribut=
ing)
host25# cc1 up
vm-10-20-20-11# arp -da
vm-10-20-20-11# ping 10.20.20.254
ping: sendto: Host is down

(again, only lagg is preventing arp, tap <-> epair in bridge still works fi=
ne)
vm-10-20-20-11# ping 10.20.20.6
success!
jail-10-20-20-6# ping 10.20.20.11
success!

Conclusion: When bridging a vnet/tap interface with a lagg.vlan interface (=
vlan
interface with lagg [laggproto lacp] parent) arp replies do not enter the
vnet/tap interface on the bridge when *both* lagg members are up.  By downi=
ng
one of the two interfaces in the lagg group, arp replies enter the vnet/tap
interface as expected.


Final notes:
I've not included it in this post, but I've attempted to remove all the
hardware offloading features from the interfaces lagg0/lagg0.2020/cc0/cc1 as
well as toggled lagg0 lagghash, toggled sysctls net.link.lagg.* and
net.link.bridge.*, as well as upgraded to 13-STABLE.  No luck moving data o=
ver
the lagg until I down one of the two lagg0 interfaces.  For brevity, I used=
 the
command 'ping host-ip' in the examples above, and only displayed a simple
response of success/fail.  In testing I mostly performed pings for reasonab=
ly
long periods (ex: -c 10 -t 2), to confirm the above examples.

I'd be happy to help test further if anyone has any suggestions.

Thank you!

-kvs

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-240106-29815-SYuNxYuoyr>