Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 May 2025 22:45:27 +0300
From:      Ivan <email@nigge.ru>
To:        "freebsd-net@freebsd.org" <freebsd-net@FreeBSD.org>
Subject:   =?utf-8?Q?TCP_sends_9KB_segments_via_netgraph_tunnel_despite_MTU/?= =?utf-8?Q?MSS_=E2=80=94_TSO-related=3F?=
Message-ID:  <8E9DD050-7A06-474E-BEAA-3600C4B0E587@nigge.ru>

next in thread | raw e-mail | index | archive | help

--Apple-Mail=_87AE5D09-5AF8-41E9-8322-BF3A6C359EDD
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Hello,

I've been investigating a network issue that took quite some time to =
trace. I still cannot reproduce it in a test environment, but it =
consistently occurs on a specific FreeBSD server with a more complex =
network configuration.

Summary of the issue: =20
Under certain conditions, the system attempts to send TCP packets larger =
than 9 KB through a netgraph-based tunnel with MTU 1472, even though MSS =
was negotiated to 1400.

This happens when the initial route is via the default uplink, but PF =
then re-routes the packet via the netgraph tunnel using `route-to`. If =
the traffic is routed through ng0 directly (without PF), the issue does =
not occur. The problem also disappears if TSO is disabled on the uplink =
NIC.

System:
  FreeBSD 13.5-RELEASE
  releng/13.5-n259162-882b9f3f2218 GENERIC amd64

Interfaces:

- Primary LAN interface (where disabling TSO fixes the problem):
    igb0, MTU 1500 =20
    options=3D4e520bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,
                    VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,
                    RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>

- Internet uplink:
    onp, VLAN over igb0, MTU 1500 =20
    options=3D4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>

- Netgraph tunnel:
    ng0, MTU 1472 =20
    inet 10.10.0.1 =E2=86=92 10.10.0.2

PF rules used for re-routing:
    nat log(all) on onp inet from 10.10.0.1 to any tag NG -> (ng0) =
round-robin
    pass out quick on onp route-to (ng0 10.10.0.2) inet all flags S/SA =
keep state tagged NG

Packet trace (via pflog during a POST request ~10KB to YouTube):

    15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags [P.], =
seq 597:9703, length 9106
    15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 =
unreachable - need to frag (mtu 1472)

This shows the kernel trying to send a 9106-byte segment over a link =
that clearly can't handle it. The MSS was already negotiated at 1400, so =
this seems unexpected. The ICMP response is generated locally. The =
result is segment loss, out-of-order retransmissions, and poor TLS =
performance.

I also reproduced this behavior with OpenVPN =E2=80=94 so the issue is =
not netgraph-specific.

Questions:
- Is this expected behavior due to TSO interacting poorly with PF =
route-to?
- Should TSO respect the effective MTU based on the post-PF routing =
decision?
- Or is this a bug in the TCP offload path?

Thanks in advance for any insights.


--Apple-Mail=_87AE5D09-5AF8-41E9-8322-BF3A6C359EDD
Content-Disposition: attachment;
	filename=pflog.txt
Content-Type: text/plain;
	x-unix-mode=0644;
	name="pflog.txt"
Content-Transfer-Encoding: quoted-printable

15:46:01.630576 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[S], seq =
2266960354, win 32768, options \[mss 1460,nop,wscale 7,sackOK,TS val =
1619341301 ecr 0], length 0
15:46:01.706085 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[S.], =
seq 3704095926, ack 2266960355, win 65535, options \[mss 1400,sackOK,TS =
val 3246736165 ecr 1619341301,nop,wscale 8], length 0
15:46:01.706120 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack =
3704095927, win 263, options \[nop,nop,TS val 1619341376 ecr =
3246736165], length 0
15:46:01.706736 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 0:517, ack 1, win 263, options \[nop,nop,TS val 1619341377 ecr =
3246736165], length 517
15:46:01.781807 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
518, win 1048, options \[nop,nop,TS val 3246736241 ecr 1619341377], =
length 0
15:46:01.782272 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq =
1:1389, ack 518, win 1050, options \[nop,nop,TS val 3246736242 ecr =
1619341377], length 1388
15:46:01.782902 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 1389:2777, ack 518, win 1050, options \[nop,nop,TS val 3246736242 =
ecr 1619341377], length 1388
15:46:01.782913 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack =
2777, win 252, options \[nop,nop,TS val 1619341453 ecr 3246736242], =
length 0
15:46:01.782918 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq =
2777:4165, ack 518, win 1050, options \[nop,nop,TS val 3246736242 ecr =
1619341377], length 1388
15:46:01.783121 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 4165:5330, ack 518, win 1050, options \[nop,nop,TS val 3246736242 =
ecr 1619341377], length 1165
15:46:01.783132 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack =
5330, win 254, options \[nop,nop,TS val 1619341453 ecr 3246736242], =
length 0
15:46:01.784246 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 517:597, ack 5330, win 263, options \[nop,nop,TS val 1619341454 ecr =
3246736242], length 80
15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 597:9703, ack 5330, win 263, options \[nop,nop,TS val 1619341455 ecr =
3246736242], length 9106
15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 =
unreachable - need to frag (mtu 1472), length 576
15:46:01.859245 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 5330:5978, ack 598, win 1050, options \[nop,nop,TS val 3246736319 =
ecr 1619341454], length 648
15:46:01.859435 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 9703:9734, ack 5978, win 263, options \[nop,nop,TS val 1619341529 =
ecr 3246736319], length 31
15:46:01.934863 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
598, win 1050, options \[nop,nop,TS val 3246736394 ecr =
1619341454,nop,nop,sack 1 {9704:9735}], length 0
15:46:02.146317 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
597:1985, ack 5978, win 263, options \[nop,nop,TS val 1619341816 ecr =
3246736394], length 1388
15:46:02.221109 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
1986, win 1045, options \[nop,nop,TS val 3246736681 ecr =
1619341816,nop,nop,sack 1 {9704:9735}], length 0
15:46:02.221119 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
1985:4761, ack 5978, win 263, options \[nop,nop,TS val 1619341891 ecr =
3246736681], length 2776
15:46:02.221187 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 =
unreachable - need to frag (mtu 1472), length 576
15:46:02.503383 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
1985:3373, ack 5978, win 263, options \[nop,nop,TS val 1619342173 ecr =
3246736681], length 1388
15:46:02.578316 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
3374, win 1040, options \[nop,nop,TS val 3246737038 ecr =
1619342173,nop,nop,sack 1 {9704:9735}], length 0
15:46:02.578345 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
3373:6149, ack 5978, win 263, options \[nop,nop,TS val 1619342248 ecr =
3246737038], length 2776
15:46:02.578394 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 =
unreachable - need to frag (mtu 1472), length 576
15:46:02.856709 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
3373:4761, ack 5978, win 263, options \[nop,nop,TS val 1619342527 ecr =
3246737038], length 1388
15:46:02.931490 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
4762, win 1035, options \[nop,nop,TS val 3246737391 ecr =
1619342527,nop,nop,sack 1 {9704:9735}], length 0
15:46:02.931503 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
4761:7537, ack 5978, win 263, options \[nop,nop,TS val 1619342602 ecr =
3246737391], length 2776
15:46:02.931525 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 =
unreachable - need to frag (mtu 1472), length 576
15:46:03.212023 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
4761:6149, ack 5978, win 263, options \[nop,nop,TS val 1619342882 ecr =
3246737391], length 1388
15:46:03.287577 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
6150, win 1030, options \[nop,nop,TS val 3246737747 ecr =
1619342882,nop,nop,sack 1 {9704:9735}], length 0
15:46:03.287589 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
6149:8925, ack 5978, win 263, options \[nop,nop,TS val 1619342958 ecr =
3246737747], length 2776
15:46:03.287613 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 =
unreachable - need to frag (mtu 1472), length 576
15:46:03.567171 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
6149:7537, ack 5978, win 263, options \[nop,nop,TS val 1619343237 ecr =
3246737747], length 1388
15:46:03.642204 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
7538, win 1025, options \[nop,nop,TS val 3246738102 ecr =
1619343237,nop,nop,sack 1 {9704:9735}], length 0
15:46:03.642216 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 7537:9734, ack 5978, win 263, options \[nop,nop,TS val 1619343312 =
ecr 3246738102], length 2197
15:46:03.642238 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 =
unreachable - need to frag (mtu 1472), length 576
15:46:03.923035 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], seq =
7537:8925, ack 5978, win 263, options \[nop,nop,TS val 1619343593 ecr =
3246738102], length 1388
15:46:03.998014 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
8926, win 1020, options \[nop,nop,TS val 3246738457 ecr =
1619343593,nop,nop,sack 1 {9704:9735}], length 0
15:46:03.998030 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 8925:9734, ack 5978, win 263, options \[nop,nop,TS val 1619343668 =
ecr 3246738457], length 809
15:46:04.073766 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
9735, win 1017, options \[nop,nop,TS val 3246738533 ecr =
1619343668,nop,nop,sack 1 {9704:9735}], length 0
15:46:04.074310 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 5978:6009, ack 9735, win 1017, options \[nop,nop,TS val 3246738534 =
ecr 1619343668], length 31
15:46:04.113500 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack =
6009, win 263, options \[nop,nop,TS val 1619343784 ecr 3246738534], =
length 0
15:46:04.170212 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 6009:6259, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 =
ecr 1619343668], length 250
15:46:04.170238 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 6259:7647, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 =
ecr 1619343668], length 1388
15:46:04.170253 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack =
7647, win 251, options \[nop,nop,TS val 1619343840 ecr 3246738630], =
length 0
15:46:04.170256 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 7647:9035, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 =
ecr 1619343668], length 1388
15:46:04.170461 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 9734:9769, ack 9035, win 263, options \[nop,nop,TS val 1619343840 =
ecr 3246738630], length 35
15:46:04.170837 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], seq =
9035:10423, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 ecr =
1619343668], length 1388
15:46:04.170872 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 10423:10988, ack 9735, win 1017, options \[nop,nop,TS val 3246738630 =
ecr 1619343668], length 565
15:46:04.170876 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[.], ack =
10988, win 248, options \[nop,nop,TS val 1619343841 ecr 3246738630], =
length 0
15:46:04.171286 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 9769:9817, ack 10988, win 263, options \[nop,nop,TS val 1619343841 =
ecr 3246738630], length 48
15:46:04.171317 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[P.], =
seq 9817:9841, ack 10988, win 263, options \[nop,nop,TS val 1619343841 =
ecr 3246738630], length 24
15:46:04.171436 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[F.], =
seq 9841, ack 10988, win 263, options \[nop,nop,TS val 1619343841 ecr =
3246738630], length 0
15:46:04.172737 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[P.], =
seq 10988:11027, ack 9735, win 1017, options \[nop,nop,TS val 3246738632 =
ecr 1619343668], length 39
15:46:04.172748 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq =
2266970089, win 0, length 0
15:46:04.246038 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
9818, win 1017, options \[nop,nop,TS val 3246738706 ecr 1619343840], =
length 0
15:46:04.246104 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq =
2266970172, win 0, length 0
15:46:04.246726 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[F.], =
seq 11027, ack 9842, win 1017, options \[nop,nop,TS val 3246738706 ecr =
1619343841], length 0
15:46:04.246731 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq =
2266970196, win 0, length 0
15:46:04.246735 IP 209.85.233.198.443 > 10.10.0.1.63736: Flags \[.], ack =
9843, win 1017, options \[nop,nop,TS val 3246738706 ecr 1619343841], =
length 0
15:46:04.246736 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags \[R], seq =
2266970197, win 0, length 0

--Apple-Mail=_87AE5D09-5AF8-41E9-8322-BF3A6C359EDD
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=us-ascii




--Apple-Mail=_87AE5D09-5AF8-41E9-8322-BF3A6C359EDD--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8E9DD050-7A06-474E-BEAA-3600C4B0E587>