Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 May 2025 22:45:27 +0300
From:      Ivan <email@nigge.ru>
To:        "freebsd-net@freebsd.org" <freebsd-net@FreeBSD.org>
Subject:   =?utf-8?Q?TCP_sends_9KB_segments_via_netgraph_tunnel_despite_MTU/?= =?utf-8?Q?MSS_=E2=80=94_TSO-related=3F?=
Message-ID:  <8E9DD050-7A06-474E-BEAA-3600C4B0E587@nigge.ru>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Hello,

I've been investigating a network issue that took quite some time to trace. I still cannot reproduce it in a test environment, but it consistently occurs on a specific FreeBSD server with a more complex network configuration.

Summary of the issue:  
Under certain conditions, the system attempts to send TCP packets larger than 9 KB through a netgraph-based tunnel with MTU 1472, even though MSS was negotiated to 1400.

This happens when the initial route is via the default uplink, but PF then re-routes the packet via the netgraph tunnel using `route-to`. If the traffic is routed through ng0 directly (without PF), the issue does not occur. The problem also disappears if TSO is disabled on the uplink NIC.

System:
  FreeBSD 13.5-RELEASE
  releng/13.5-n259162-882b9f3f2218 GENERIC amd64

Interfaces:

- Primary LAN interface (where disabling TSO fixes the problem):
    igb0, MTU 1500  
    options=4e520bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,
                    VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,
                    RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>

- Internet uplink:
    onp, VLAN over igb0, MTU 1500  
    options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>

- Netgraph tunnel:
    ng0, MTU 1472  
    inet 10.10.0.1 → 10.10.0.2

PF rules used for re-routing:
    nat log(all) on onp inet from 10.10.0.1 to any tag NG -> (ng0) round-robin
    pass out quick on onp route-to (ng0 10.10.0.2) inet all flags S/SA keep state tagged NG

Packet trace (via pflog during a POST request ~10KB to YouTube):

    15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags [P.], seq 597:9703, length 9106
    15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472)

This shows the kernel trying to send a 9106-byte segment over a link that clearly can't handle it. The MSS was already negotiated at 1400, so this seems unexpected. The ICMP response is generated locally. The result is segment loss, out-of-order retransmissions, and poor TLS performance.

I also reproduced this behavior with OpenVPN — so the issue is not netgraph-specific.

Questions:
- Is this expected behavior due to TSO interacting poorly with PF route-to?
- Should TSO respect the effective MTU based on the post-PF routing decision?
- Or is this a bug in the TCP offload path?

Thanks in advance for any insights.


[-- Attachment #2 --]
15:46:01.630576 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[S], seq 2266960354, win 32768, options \[mss 1460,nop,wscale 7,sackOK,TS val 1619341301 ecr 0], length 0
15:46:01.706085 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[S.], seq 3704095926, ack 2266960355, win 65535, options \[mss 1400,sackOK,TS val 3246736165 ecr 1619341301,nop,wscale 8], length 0
15:46:01.706120 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack 3704095927, win 263, options \[nop,nop,TS val 1619341376 ecr 3246736165], length 0
15:46:01.706736 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 0:517, ack 1, win 263, options \[nop,nop,TS val 1619341377 ecr 3246736165], length 517
15:46:01.781807 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 518, win 1048, options \[nop,nop,TS val 3246736241 ecr 1619341377], length 0
15:46:01.782272 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq 1:1389, ack 518, win 1050, options \[nop,nop,TS val 3246736242 ecr 1619341377], length 1388
15:46:01.782902 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 1389:2777, ack 518, win 1050, options \[nop,nop,TS val 3246736242 ecr 1619341377], length 1388
15:46:01.782913 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack 2777, win 252, options \[nop,nop,TS val 1619341453 ecr 3246736242], length 0
15:46:01.782918 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq 2777:4165, ack 518, win 1050, options \[nop,nop,TS val 3246736242 ecr 1619341377], length 1388
15:46:01.783121 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 4165:5330, ack 518, win 1050, options \[nop,nop,TS val 3246736242 ecr 1619341377], length 1165
15:46:01.783132 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack 5330, win 254, options \[nop,nop,TS val 1619341453 ecr 3246736242], length 0
15:46:01.784246 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 517:597, ack 5330, win 263, options \[nop,nop,TS val 1619341454 ecr 3246736242], length 80
15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 597:9703, ack 5330, win 263, options \[nop,nop,TS val 1619341455 ecr 3246736242], length 9106
15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472), length 576
15:46:01.859245 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 5330:5978, ack 598, win 1050, options \[nop,nop,TS val 3246736319 ecr 1619341454], length 648
15:46:01.859435 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 9703:9734, ack 5978, win 263, options \[nop,nop,TS val 1619341529 ecr 3246736319], length 31
15:46:01.934863 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 598, win 1050, options \[nop,nop,TS val 3246736394 ecr 1619341454,nop,nop,sack 1 {9704:9735}], length 0
15:46:02.146317 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 597:1985, ack 5978, win 263, options \[nop,nop,TS val 1619341816 ecr 3246736394], length 1388
15:46:02.221109 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 1986, win 1045, options \[nop,nop,TS val 3246736681 ecr 1619341816,nop,nop,sack 1 {9704:9735}], length 0
15:46:02.221119 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 1985:4761, ack 5978, win 263, options \[nop,nop,TS val 1619341891 ecr 3246736681], length 2776
15:46:02.221187 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472), length 576
15:46:02.503383 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 1985:3373, ack 5978, win 263, options \[nop,nop,TS val 1619342173 ecr 3246736681], length 1388
15:46:02.578316 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 3374, win 1040, options \[nop,nop,TS val 3246737038 ecr 1619342173,nop,nop,sack 1 {9704:9735}], length 0
15:46:02.578345 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 3373:6149, ack 5978, win 263, options \[nop,nop,TS val 1619342248 ecr 3246737038], length 2776
15:46:02.578394 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472), length 576
15:46:02.856709 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 3373:4761, ack 5978, win 263, options \[nop,nop,TS val 1619342527 ecr 3246737038], length 1388
15:46:02.931490 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 4762, win 1035, options \[nop,nop,TS val 3246737391 ecr 1619342527,nop,nop,sack 1 {9704:9735}], length 0
15:46:02.931503 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 4761:7537, ack 5978, win 263, options \[nop,nop,TS val 1619342602 ecr 3246737391], length 2776
15:46:02.931525 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472), length 576
15:46:03.212023 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 4761:6149, ack 5978, win 263, options \[nop,nop,TS val 1619342882 ecr 3246737391], length 1388
15:46:03.287577 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 6150, win 1030, options \[nop,nop,TS val 3246737747 ecr 1619342882,nop,nop,sack 1 {9704:9735}], length 0
15:46:03.287589 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 6149:8925, ack 5978, win 263, options \[nop,nop,TS val 1619342958 ecr 3246737747], length 2776
15:46:03.287613 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472), length 576
15:46:03.567171 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 6149:7537, ack 5978, win 263, options \[nop,nop,TS val 1619343237 ecr 3246737747], length 1388
15:46:03.642204 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 7538, win 1025, options \[nop,nop,TS val 3246738102 ecr 1619343237,nop,nop,sack 1 {9704:9735}], length 0
15:46:03.642216 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 7537:9734, ack 5978, win 263, options \[nop,nop,TS val 1619343312 ecr 3246738102], length 2197
15:46:03.642238 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable - need to frag (mtu 1472), length 576
15:46:03.923035 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq 7537:8925, ack 5978, win 263, options \[nop,nop,TS val 1619343593 ecr 3246738102], length 1388
15:46:03.998014 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 8926, win 1020, options \[nop,nop,TS val 3246738457 ecr 1619343593,nop,nop,sack 1 {9704:9735}], length 0
15:46:03.998030 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 8925:9734, ack 5978, win 263, options \[nop,nop,TS val 1619343668 ecr 3246738457], length 809
15:46:04.073766 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 9735, win 1017, options \[nop,nop,TS val 3246738533 ecr 1619343668,nop,nop,sack 1 {9704:9735}], length 0
15:46:04.074310 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 5978:6009, ack 9735, win 1017, options \[nop,nop,TS val 3246738534 ecr 1619343668], length 31
15:46:04.113500 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack 6009, win 263, options \[nop,nop,TS val 1619343784 ecr 3246738534], length 0
15:46:04.170212 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 6009:6259, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 ecr 1619343668], length 250
15:46:04.170238 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 6259:7647, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 ecr 1619343668], length 1388
15:46:04.170253 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack 7647, win 251, options \[nop,nop,TS val 1619343840 ecr 3246738630], length 0
15:46:04.170256 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 7647:9035, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 ecr 1619343668], length 1388
15:46:04.170461 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 9734:9769, ack 9035, win 263, options \[nop,nop,TS val 1619343840 ecr 3246738630], length 35
15:46:04.170837 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq 9035:10423, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 ecr 1619343668], length 1388
15:46:04.170872 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 10423:10988, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 ecr 1619343668], length 565
15:46:04.170876 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack 10988, win 248, options \[nop,nop,TS val 1619343841 ecr 3246738630], length 0
15:46:04.171286 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 9769:9817, ack 10988, win 263, options \[nop,nop,TS val 1619343841 ecr 3246738630], length 48
15:46:04.171317 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], seq 9817:9841, ack 10988, win 263, options \[nop,nop,TS val 1619343841 ecr 3246738630], length 24
15:46:04.171436 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[F.], seq 9841, ack 10988, win 263, options \[nop,nop,TS val 1619343841 ecr 3246738630], length 0
15:46:04.172737 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], seq 10988:11027, ack 9735, win 1017, options \[nop,nop,TS val 3246738632 ecr 1619343668], length 39
15:46:04.172748 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq 2266970089, win 0, length 0
15:46:04.246038 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 9818, win 1017, options \[nop,nop,TS val 3246738706 ecr 1619343840], length 0
15:46:04.246104 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq 2266970172, win 0, length 0
15:46:04.246726 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[F.], seq 11027, ack 9842, win 1017, options \[nop,nop,TS val 3246738706 ecr 1619343841], length 0
15:46:04.246731 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq 2266970196, win 0, length 0
15:46:04.246735 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack 9843, win 1017, options \[nop,nop,TS val 3246738706 ecr 1619343841], length 0
15:46:04.246736 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq 2266970197, win 0, length 0

[-- Attachment #3 --]



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8E9DD050-7A06-474E-BEAA-3600C4B0E587>