Date: Mon, 06 Mar 2023 22:51:19 +0000 From: bugzilla-noreply@freebsd.org To: jail@FreeBSD.org Subject: [Bug 240106] VNET issue with ARP and routing sockets in jails Message-ID: <bug-240106-29815-SYuNxYuoyr@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-240106-29815@https.bugs.freebsd.org/bugzilla/> References: <bug-240106-29815@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D240106 kvs <overwatch@lab.kyngin.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |overwatch@lab.kyngin.net --- Comment #26 from kvs <overwatch@lab.kyngin.net> --- Hello Everyone! I believe I have hit the same bug, though I believe my issue is specifically related to lagg/lacp. I can confirm this problem affects tap as well as ep= air interfaces on a bridge when attempting to send over a vlan interface that h= as a lagg parent. System Description: FreeBSD 13.1 w/ Chelsio T6225-SO-CR NIC, identified by = cc0 / cc1 (confirmed up and operational), host25 is the system name. Network is 10.20.20.0/24, gateway is 10.20.20.254 (mac: 02:11:22:33:44:55), host is assigned 10.20.20.5, epair0 is assigned to jail-10-20-20-6 (with matching I= P of 10.20.20.6 on epair0b). Switch is set to accept tagged frames only for vlan 2020. All mtu's 1500. When adding a vlan interface child of cc0 to the bridge, I do not have any trouble passing data over the lagg. host25# ifconfig cc0.2020 create up host25# ifconfig bridge2020 create up host25# ifconfig bridge2020 addm cc0.2020 host25# ifconfig bridge2020 addm epair0a host25# ifconfig bridge2020 inet 10.20.20.25/24 (pings from host -> gateway works fine) host25# ping 10.20.20.254 success! (pings from jail -> gateway also work) host25# jexec jail-10-20-20-6 sh jail-10-20-20-6# ping 10.20.20.254 success! (I now reset bridge2020 to use a lagg interface.) host25# ifconfig bridge2020 destroy host25# ifconfig cc0.2020 destroy host25# ifconfig lagg0 create laggproto lacp laggport cc0 laggport cc1 up host25# ifconfig lagg0.2020 create up host25# ifconfig bridge2020 create up host25# ifconfig bridge2020 addm lagg0.2020 addm epair0a host25# ifconfig bridge2020 inet 10.20.20.25/24 (pings from host -> gateway work fine) host25# ping 10.20.20.254 success! (pings from jail -> gateway timeout) host25# jexec jail-10-20-20-6 sh jail-10-20-20-6# ping 10.20.20.254 ping: sendto: Host is down (arp cache from jail appears to not include gateway mac) jail-10-20-20-6# arp -an ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at (incomplete) on epair0b expired [ethernet] (I assign mac statically.) jail-10-20-20-6# arp -s 10.20.20.254 02:11:22:33:44:55 jail-10-20-20-6# arp -an ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at 02:11:22:33:44:55 on epair0b permanent [ethernet] (attempt ping again after static arp assignment) jail-10-20-20-6# ping 10.20.20.254 success! What comes next is a reasonably big presumption on my part, so hopefully someone more educated on the topic kindly corrects me where I'm wrong. See= ing that the vlan interface of cc0.2020 works in the bridge when lagg0.2020 is removed/destroyed. I believe it's possible that the issue is related to arp responses being sent down one of the two lagg members and the host OS not b= eing aware of that. Although the reply does come inbound on one of the host OS interfaces, it doesn't propagate that down across the epair / tap. The VM/= Jail then never sees the arp reply, and keeps the arp as "(incomplete)" in it's cache. When using a single interface, or a lagg with only a single interfa= ce active, arp appears to work as expected. To help observe this, I did the following: 1) From host25, I watched epair0a, cc0, and cc1 using host25# tcpdump -e -vvv -XX -i [interface] 2) inside jail-10-20-20-6, I attempted to ping the gateway to generate the = arp traffic: ping -c 1 -t 1 -q 10.20.20.254 PING 10.20.20.254 (10.20.20.254): 56 data bytes --- 10.20.20.254 ping statistics --- 1 packets transmitted, 0 packets received, 100.0% packet loss 3) Results follow: # tcpdump -e -vvv -XX -i epair0a tcpdump: listening on epair0a, link-type EN10MB (Ethernet), capture size 26= 2144 bytes 01:43:54.768801 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001=20 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406=20 ................ 0x0020: 0000 0000 0000 0a14 14fe .......... 01:43:54.768936 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 56: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 42 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001=20 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406=20 ................ 0x0020: 0000 0000 0000 0a14 14fe 0000 0000 0000=20 ................ 0x0030: 0000 0000 0000 0000 ........ 01:43:54.768969 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 46 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001=20 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406=20 ................ 0x0020: 0000 0000 0000 0a14 14fe 0000 0000 0000=20 ................ 0x0030: 0000 0000 0000 0000 0000 0000 .........= ... # tcpdump -e -vvv -XX -i cc0 tcpdump: listening on cc0, link-type EN10MB (Ethernet), capture size 262144 bytes 01:43:54.768822 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.= 1Q (0x8100), length 46: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 = (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe .............. 01:43:54.769126 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A= RP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 = (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01:43:54.769171 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A= RP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 = (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01:43:54.769221 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A= RP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 = (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ # tcpdump -e -vvv -XX -i cc1 tcpdump: listening on cc1, link-type EN10MB (Ethernet), capture size 262144 bytes 01:43:54.768876 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.= 1Q (0x8100), length 60: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 = (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 42 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 ............ 01:43:54.768965 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.= 1Q (0x8100), length 64: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 = (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 46 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ Apparently 1 arp request is sent over cc0, and 2 over cc1, all 3 replies co= me back over cc0. None of them appear to enter epair0a. I've not had any luck changing lagg hashes at this stage to try to force requests down one of the= two lagg members, so instead I downed one of the interfaces in the lagg. (bridge2020 is still up with epair0a and lagg0.2020 (lagg0 contains cc0+cc1 both up)) jail-10-20-20-6# ping 10.20.20.254 ping: sendto: Host is down host25# ifconfig cc1 down (confirm arp cache is empty in jail) jail-10-20-20-6# arp -da jail-10-20-20-6# ping 10.20.20.254 success! (using tcpdump, epair0a now sees the arp replies as well (I excluded the tcpdump for cc0 here because it's largely identical)) # tcpdump -e -vvv -XX -i epair0a 15:23:10.623560 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: 0001 0800 0604 0001 0207 f080 de0b 0a14 ................ 0x0010: 1406 0000 0000 0000 0a14 14fe ............ 15:23:10.623916 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623924 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623926 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623943 02:07:f0:80:de:0b (oui Unknown) > 02:11:22:33:44:55 (oui Unknown), ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 56841, offset 0, flags [none], proto ICMP (1), length 84) 10.20.20.6 > 10.20.20.254: ICMP echo request, id 22927, seq 0, leng= th 64 0x0000: 4500 0054 de09 0000 4001 5f74 0a14 1406 E..T....@._t.... 0x0010: 0a14 14fe 0800 8750 598f 0000 0006 2ec0 .......PY....... 0x0020: 15c1 e795 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 15:23:10.624147 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 54016, offset 0, flags [none], proto ICMP (1), length 84) 10.20.20.254 > 10.20.20.6: ICMP echo reply, id 22927, seq 0, length= 64 0x0000: 4500 0054 d300 0000 4001 6a7d 0a14 14fe E..T....@.j}.... 0x0010: 0a14 1406 0000 8f50 598f 0000 0006 2ec0 .......PY....... 0x0020: 15c1 e795 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 (arp cache seems valid as well) jail-10-20-20-6# arp -na ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at 02:11:22:33:44:55 on epair0b expires in 1085 seconds [ethernet] Additional thoughts: 1) With lagg0, cc0, and cc1 up, I created a second jail on host25 using 10.20.20.7 (epair1). I add epair1a to bridge2020 (now including epair0a, epair1a and lagg0.2020). When I attempt to ping from jail-10-20-20-6 to .254 I get a timeout as previously experienced. Pinging from .6 to .7 appears to work without any trouble, if lagg0 has any cc0/1 members up or down. This was expected, as packets should never trave= rse lagg0.2020, but I did want to test/confirm. 2) I did run some ping tests with untagged lagg0 in the bridge, and it does appear it's working without trouble. I removed lagg0.2020 from bridge2020, then added lagg0 to bridge2020, and set the switch ports as untagged in the switch. The packets appear to move without trouble even with both cc0+cc1 = up.=20 I need to further test this to be conclusive, but this felt less important = to perform at this time as it doesn't solve the requirement I need of tagged ports. 3) I have a few bhyve vm's that I've added as tests, tap0, tap1, etc to the bridge2020. The results seem to be largely consistent with jails. You cou= ld replace jail-10-20-20-6, with vm-10-20-20-11 (tested freebsd / openbsd / windows) for instance, and these same results appear. Packets fail when originating from tap/vnet and traversing lagg0.2020. (again, lagg0/lacp is up, includes cc0+cc1, bridge2020 includes lagg0.2020, tap0, and epair0a devices) host25# ping 10.20.20.254 success! vm-10-20-20-11# arp -da (attempt traverse lagg0.2020) vm-10-20-20-11# ping 10.20.20.254 ping: sendto: Host is down (try tap0 -> epair0) vm-10-20-20-11# ping 10.20.20.6 success! (try tests again with lagg0 member cc1 down) host25# cc1 down (tap0 -> lagg0.2020 -> 10.20.20.254) vm-10-20-20-11# ping 10.20.20.254 success! (again tap0 -> epair0, works as expected) vm-10-20-20-11# ping 10.20.20.6 success! (turn cc1 back up, wait about 10 seconds for both laggports to be distribut= ing) host25# cc1 up vm-10-20-20-11# arp -da vm-10-20-20-11# ping 10.20.20.254 ping: sendto: Host is down (again, only lagg is preventing arp, tap <-> epair in bridge still works fi= ne) vm-10-20-20-11# ping 10.20.20.6 success! jail-10-20-20-6# ping 10.20.20.11 success! Conclusion: When bridging a vnet/tap interface with a lagg.vlan interface (= vlan interface with lagg [laggproto lacp] parent) arp replies do not enter the vnet/tap interface on the bridge when *both* lagg members are up. By downi= ng one of the two interfaces in the lagg group, arp replies enter the vnet/tap interface as expected. Final notes: I've not included it in this post, but I've attempted to remove all the hardware offloading features from the interfaces lagg0/lagg0.2020/cc0/cc1 as well as toggled lagg0 lagghash, toggled sysctls net.link.lagg.* and net.link.bridge.*, as well as upgraded to 13-STABLE. No luck moving data o= ver the lagg until I down one of the two lagg0 interfaces. For brevity, I used= the command 'ping host-ip' in the examples above, and only displayed a simple response of success/fail. In testing I mostly performed pings for reasonab= ly long periods (ex: -c 10 -t 2), to confirm the above examples. I'd be happy to help test further if anyone has any suggestions. Thank you! -kvs --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-240106-29815-SYuNxYuoyr>