Date: Wed, 22 Mar 2023 03:23:58 +0000 From: bugzilla-noreply@freebsd.org To: jail@FreeBSD.org Subject: [Bug 240106] VNET issue with ARP and routing sockets in jails Message-ID: <bug-240106-29815-7vn8D39c3x@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-240106-29815@https.bugs.freebsd.org/bugzilla/> References: <bug-240106-29815@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D240106 --- Comment #28 from kvs <overwatch@lab.kyngin.net> --- (In reply to Slawomir Wojciech Wojtczak from comment #27) I have some headway on my end, though I don't know how much it's related to= the earlier bugs at this point. After further testing, vlans apparently aren't related to my problem. The problem occurs on lagg without vlan interfaces.=20=20 When a jail+VNET (on bridge) sends an ARP request it traverses the bridge a= nd exits both interfaces in the host lagg group. When the ARP reply comes bac= k, it appears it will only ever enter the host bridge if it comes in on the primary lagg member. I'm not certain this is exclusive to vnets, also poss= ibly this is normal operation for laggs using lacp? Lab test: lagg0 (ports cc0 + cc1), bridge2020 (members epair0a & lagg0) ping from jail+VNET to switch (10.20.20.254), using source epair0b (10.20.20.77) (epair0b -> epair0a -> bridge2020 -> lagg0 -> cc0/cc1 -> switch) tcpdump -i epair0a 10:00:17.981011 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 28 tcpdump -i bridge2020 10:00:17.981051 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 28 tcpdump -i lagg0 10:00:17.981030 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 28 10:00:17.981282 ARP, Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknow= n), length 46 tcpdump -i cc0: 10:00:17.981050 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 42 tcpdump -i cc1: 10:00:17.981041 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 28 10:00:17.981282 ARP, Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknow= n), length 46 Arp table is not populated on VM, as bridge2020 and epair0a/b never sees ARP reply come in over cc1. I believe in my case specifically the switch is se= eing cc1 as the primary lagg member while the FreeBSD server sees cc0 as the pri= mary lagg member. When ARP replies manage to come in over cc0, the ARP replies make it to the vnet interface and the jail populates its ARP table. I can force this even= t by downing cc1 or shutting down the cc1 switch port (in both cases it appears = the switch then identifies cc0 as the primary lagg member over which it sends A= RP replies). Alternatively, if both cc0 and cc1 are up, and the switch sends = an ARP reply over cc0 (has happened randomly), the ARP reply does makes it thr= ough the bridge/epair and populates the ARP cache on the VM. Example after ifconfig cc1 down: tcpdump -i epair0a 10:48:18.949695 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 28 10:48:18.950041 ARP, Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknow= n), length 46 tcpdump -i bridge2020 10:48:18.949731 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 28 10:48:18.950041 ARP, Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknow= n), length 46 tcpdump -i lagg0 10:48:18.949711 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 28 10:48:18.950041 ARP, Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknow= n), length 46 tcpdump -i cc0 10:48:18.949722 ARP, Request who-has 10.20.20.254 tell 10.20.20.77, length = 28 10:48:18.950041 ARP, Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknow= n), length 46 ARP table on VM is now populated with switch address, and everything appear= s to work as normal over lagg0 (with cc0 up / cc1 down).=20=20 In the mean time I've managed to get the switch configured to send L2/ARP o= ver both lagg members which has fixed the immediate problem. Though I do think it's strange that FreeBSD populates the ARP table just fine on the host over cc1, but just wont send that ARP reply over the bridge interface unless it comes in on cc0. That *feels* like a bug, as it only seems to affect the second interface on a lagg that's in a bridge, and quite possibly only for layer 2 (L2/3 needs further testing - I've not lost packets once the arp ta= ble is populated, but it's possible the switch was handling layer 3 differently= and always using the cc0 port, in which case FreeBSD would probably send over t= he bridge without trouble). Testing has been performed on 14-CURRENT and 13-STABLE with identical resul= ts. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-240106-29815-7vn8D39c3x>