Date: Wed, 14 May 2025 22:45:27 +0300 From: Ivan <email@nigge.ru> To: "freebsd-net@freebsd.org" <freebsd-net@FreeBSD.org> Subject: =?utf-8?Q?TCP_sends_9KB_segments_via_netgraph_tunnel_despite_MTU/?= =?utf-8?Q?MSS_=E2=80=94_TSO-related=3F?= Message-ID: <8E9DD050-7A06-474E-BEAA-3600C4B0E587@nigge.ru>
next in thread | raw e-mail | index | archive | help
--Apple-Mail=_87AE5D09-5AF8-41E9-8322-BF3A6C359EDD Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hello, I've been investigating a network issue that took quite some time to = trace. I still cannot reproduce it in a test environment, but it = consistently occurs on a specific FreeBSD server with a more complex = network configuration. Summary of the issue: =20 Under certain conditions, the system attempts to send TCP packets larger = than 9 KB through a netgraph-based tunnel with MTU 1472, even though MSS = was negotiated to 1400. This happens when the initial route is via the default uplink, but PF = then re-routes the packet via the netgraph tunnel using `route-to`. If = the traffic is routed through ng0 directly (without PF), the issue does = not occur. The problem also disappears if TSO is disabled on the uplink = NIC. System: FreeBSD 13.5-RELEASE releng/13.5-n259162-882b9f3f2218 GENERIC amd64 Interfaces: - Primary LAN interface (where disabling TSO fixes the problem): igb0, MTU 1500 =20 options=3D4e520bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU, VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO, RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> - Internet uplink: onp, VLAN over igb0, MTU 1500 =20 options=3D4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> - Netgraph tunnel: ng0, MTU 1472 =20 inet 10.10.0.1 =E2=86=92 10.10.0.2 PF rules used for re-routing: nat log(all) on onp inet from 10.10.0.1 to any tag NG -> (ng0) = round-robin pass out quick on onp route-to (ng0 10.10.0.2) inet all flags S/SA = keep state tagged NG Packet trace (via pflog during a POST request ~10KB to YouTube): 15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags [P.], = seq 597:9703, length 9106 15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 = unreachable - need to frag (mtu 1472) This shows the kernel trying to send a 9106-byte segment over a link = that clearly can't handle it. The MSS was already negotiated at 1400, so = this seems unexpected. The ICMP response is generated locally. The = result is segment loss, out-of-order retransmissions, and poor TLS = performance. I also reproduced this behavior with OpenVPN =E2=80=94 so the issue is = not netgraph-specific. Questions: - Is this expected behavior due to TSO interacting poorly with PF = route-to? - Should TSO respect the effective MTU based on the post-PF routing = decision? - Or is this a bug in the TCP offload path? Thanks in advance for any insights. --Apple-Mail=_87AE5D09-5AF8-41E9-8322-BF3A6C359EDD Content-Disposition: attachment; filename=pflog.txt Content-Type: text/plain; x-unix-mode=0644; name="pflog.txt" Content-Transfer-Encoding: quoted-printable 15:46:01.630576 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[S], seq = 2266960354, win 32768, options \[mss 1460,nop,wscale 7,sackOK,TS val = 1619341301 ecr 0], length 0 15:46:01.706085 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[S.], = seq 3704095926, ack 2266960355, win 65535, options \[mss 1400,sackOK,TS = val 3246736165 ecr 1619341301,nop,wscale 8], length 0 15:46:01.706120 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack = 3704095927, win 263, options \[nop,nop,TS val 1619341376 ecr = 3246736165], length 0 15:46:01.706736 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 0:517, ack 1, win 263, options \[nop,nop,TS val 1619341377 ecr = 3246736165], length 517 15:46:01.781807 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 518, win 1048, options \[nop,nop,TS val 3246736241 ecr 1619341377], = length 0 15:46:01.782272 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq = 1:1389, ack 518, win 1050, options \[nop,nop,TS val 3246736242 ecr = 1619341377], length 1388 15:46:01.782902 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 1389:2777, ack 518, win 1050, options \[nop,nop,TS val 3246736242 = ecr 1619341377], length 1388 15:46:01.782913 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack = 2777, win 252, options \[nop,nop,TS val 1619341453 ecr 3246736242], = length 0 15:46:01.782918 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq = 2777:4165, ack 518, win 1050, options \[nop,nop,TS val 3246736242 ecr = 1619341377], length 1388 15:46:01.783121 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 4165:5330, ack 518, win 1050, options \[nop,nop,TS val 3246736242 = ecr 1619341377], length 1165 15:46:01.783132 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack = 5330, win 254, options \[nop,nop,TS val 1619341453 ecr 3246736242], = length 0 15:46:01.784246 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 517:597, ack 5330, win 263, options \[nop,nop,TS val 1619341454 ecr = 3246736242], length 80 15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 597:9703, ack 5330, win 263, options \[nop,nop,TS val 1619341455 ecr = 3246736242], length 9106 15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 = unreachable - need to frag (mtu 1472), length 576 15:46:01.859245 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 5330:5978, ack 598, win 1050, options \[nop,nop,TS val 3246736319 = ecr 1619341454], length 648 15:46:01.859435 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 9703:9734, ack 5978, win 263, options \[nop,nop,TS val 1619341529 = ecr 3246736319], length 31 15:46:01.934863 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 598, win 1050, options \[nop,nop,TS val 3246736394 ecr = 1619341454,nop,nop,sack 1 {9704:9735}], length 0 15:46:02.146317 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 597:1985, ack 5978, win 263, options \[nop,nop,TS val 1619341816 ecr = 3246736394], length 1388 15:46:02.221109 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 1986, win 1045, options \[nop,nop,TS val 3246736681 ecr = 1619341816,nop,nop,sack 1 {9704:9735}], length 0 15:46:02.221119 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 1985:4761, ack 5978, win 263, options \[nop,nop,TS val 1619341891 ecr = 3246736681], length 2776 15:46:02.221187 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 = unreachable - need to frag (mtu 1472), length 576 15:46:02.503383 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 1985:3373, ack 5978, win 263, options \[nop,nop,TS val 1619342173 ecr = 3246736681], length 1388 15:46:02.578316 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 3374, win 1040, options \[nop,nop,TS val 3246737038 ecr = 1619342173,nop,nop,sack 1 {9704:9735}], length 0 15:46:02.578345 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 3373:6149, ack 5978, win 263, options \[nop,nop,TS val 1619342248 ecr = 3246737038], length 2776 15:46:02.578394 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 = unreachable - need to frag (mtu 1472), length 576 15:46:02.856709 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 3373:4761, ack 5978, win 263, options \[nop,nop,TS val 1619342527 ecr = 3246737038], length 1388 15:46:02.931490 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 4762, win 1035, options \[nop,nop,TS val 3246737391 ecr = 1619342527,nop,nop,sack 1 {9704:9735}], length 0 15:46:02.931503 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 4761:7537, ack 5978, win 263, options \[nop,nop,TS val 1619342602 ecr = 3246737391], length 2776 15:46:02.931525 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 = unreachable - need to frag (mtu 1472), length 576 15:46:03.212023 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 4761:6149, ack 5978, win 263, options \[nop,nop,TS val 1619342882 ecr = 3246737391], length 1388 15:46:03.287577 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 6150, win 1030, options \[nop,nop,TS val 3246737747 ecr = 1619342882,nop,nop,sack 1 {9704:9735}], length 0 15:46:03.287589 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 6149:8925, ack 5978, win 263, options \[nop,nop,TS val 1619342958 ecr = 3246737747], length 2776 15:46:03.287613 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 = unreachable - need to frag (mtu 1472), length 576 15:46:03.567171 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 6149:7537, ack 5978, win 263, options \[nop,nop,TS val 1619343237 ecr = 3246737747], length 1388 15:46:03.642204 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 7538, win 1025, options \[nop,nop,TS val 3246738102 ecr = 1619343237,nop,nop,sack 1 {9704:9735}], length 0 15:46:03.642216 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 7537:9734, ack 5978, win 263, options \[nop,nop,TS val 1619343312 = ecr 3246738102], length 2197 15:46:03.642238 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 = unreachable - need to frag (mtu 1472), length 576 15:46:03.923035 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq = 7537:8925, ack 5978, win 263, options \[nop,nop,TS val 1619343593 ecr = 3246738102], length 1388 15:46:03.998014 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 8926, win 1020, options \[nop,nop,TS val 3246738457 ecr = 1619343593,nop,nop,sack 1 {9704:9735}], length 0 15:46:03.998030 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 8925:9734, ack 5978, win 263, options \[nop,nop,TS val 1619343668 = ecr 3246738457], length 809 15:46:04.073766 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 9735, win 1017, options \[nop,nop,TS val 3246738533 ecr = 1619343668,nop,nop,sack 1 {9704:9735}], length 0 15:46:04.074310 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 5978:6009, ack 9735, win 1017, options \[nop,nop,TS val 3246738534 = ecr 1619343668], length 31 15:46:04.113500 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack = 6009, win 263, options \[nop,nop,TS val 1619343784 ecr 3246738534], = length 0 15:46:04.170212 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 6009:6259, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 = ecr 1619343668], length 250 15:46:04.170238 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 6259:7647, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 = ecr 1619343668], length 1388 15:46:04.170253 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack = 7647, win 251, options \[nop,nop,TS val 1619343840 ecr 3246738630], = length 0 15:46:04.170256 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 7647:9035, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 = ecr 1619343668], length 1388 15:46:04.170461 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 9734:9769, ack 9035, win 263, options \[nop,nop,TS val 1619343840 = ecr 3246738630], length 35 15:46:04.170837 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq = 9035:10423, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 ecr = 1619343668], length 1388 15:46:04.170872 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 10423:10988, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 = ecr 1619343668], length 565 15:46:04.170876 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack = 10988, win 248, options \[nop,nop,TS val 1619343841 ecr 3246738630], = length 0 15:46:04.171286 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 9769:9817, ack 10988, win 263, options \[nop,nop,TS val 1619343841 = ecr 3246738630], length 48 15:46:04.171317 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], = seq 9817:9841, ack 10988, win 263, options \[nop,nop,TS val 1619343841 = ecr 3246738630], length 24 15:46:04.171436 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[F.], = seq 9841, ack 10988, win 263, options \[nop,nop,TS val 1619343841 ecr = 3246738630], length 0 15:46:04.172737 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], = seq 10988:11027, ack 9735, win 1017, options \[nop,nop,TS val 3246738632 = ecr 1619343668], length 39 15:46:04.172748 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq = 2266970089, win 0, length 0 15:46:04.246038 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 9818, win 1017, options \[nop,nop,TS val 3246738706 ecr 1619343840], = length 0 15:46:04.246104 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq = 2266970172, win 0, length 0 15:46:04.246726 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[F.], = seq 11027, ack 9842, win 1017, options \[nop,nop,TS val 3246738706 ecr = 1619343841], length 0 15:46:04.246731 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq = 2266970196, win 0, length 0 15:46:04.246735 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack = 9843, win 1017, options \[nop,nop,TS val 3246738706 ecr 1619343841], = length 0 15:46:04.246736 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq = 2266970197, win 0, length 0 --Apple-Mail=_87AE5D09-5AF8-41E9-8322-BF3A6C359EDD Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii --Apple-Mail=_87AE5D09-5AF8-41E9-8322-BF3A6C359EDD--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8E9DD050-7A06-474E-BEAA-3600C4B0E587>