aps: normal; =
line-height: normal; font-family: "Helvetica Neue"; =
font-size-adjust: none; font-kerning: auto; font-variant-alternates: =
normal; font-variant-ligatures: normal; font-variant-numeric: normal; =
font-variant-east-asian: normal; font-feature-settings: normal;">- Two GRE tunnels from the same router:
- GRE to VPS0: healthy
- GRE to VPS2: problematic
- There is also an ng tunnel to VPS2 (works better than bad =
GRE, but not full speed).
- Clients behind router tested via both Wi-Fi and wired =
Ethernet (same problem).
- Also tested traffic from a VM on the router: VM path does =
not reproduce the severe collapse.
Core symptom (asymmetric)
- Only TX direction is affected (client uploads through =
GRE->VPS2).
- RX/download direction =
is not affected.
- Through GRE->VPS2, =
TCP upload collapses to about ~100 Kbit/s per flow.
- 5 parallel TCP flows produce roughly 5 x ~100 =
Kbit/s.
- iperf3 TCP shows many =
retransmissions and very small cwnd (~2.7 KB).
- Same client via GRE->VPS0 is fast (normal high =
throughput).
- Traffic generated =
locally on router through GRE->VPS2 is also fast.
Firewall / policy checks
- PF disabled completely (`pfctl -d`) for tests: problem =
remains.
- Therefore this is not =
PF rule/policy dependent in current testing.
- Also checked for ipfw/dummynet:
- `kldstat | egrep 'ipfw|dummynet|pf'` shows only =
`pf.ko`.
- `ipfw show` =
returns =E2=80=9CProtocol not available=E2=80=9D.
- `net.inet.ip.fw.enable` OID not =
present.
Offload / NIC checks =
performed
- Explicitly disabled =
offload features (multiple attempts), including:
- `-lro`
- `-tso` / tso-related toggles
- `-rxcsum` / `-txcsum` / v6 checksum =
toggles
- VLAN hardware =
offload related toggles where supported
- No improvement.
- Current interface snapshot during testing:
- `ifconfig igb0` shows options: =
`VLAN_MTU,JUMBO_MTU,WOL_MAGIC,HWSTATS` (no LRO/TSO flags =
active)
- `ifconfig onp` =
shows `options=3D0`
- Reboots of router and =
VPS2 done after changes: no improvement.
- Also tested with minimal manual GRE setup (without =
tunnel-management scripts, only one GRE up): no =
improvement.
Queue/counters/sysctl =
checks
- `netstat -Q` shows no =
queue drops (`QDrops=3D0`).
- `netstat -iW -I ...` during tests shows no obvious =
interface error growth.
- `sysctl net.inet.ip.fastforwarding` does not exist on this =
release.
UDP vs TCP =
behavior
- UDP over problematic =
GRE path can run at high throughput (similar order from client and from =
router), i.e., no analogous dramatic collapse.
- TCP over same path collapses hard in TX direction from =
forwarded clients.
Packet capture =
findings
- Collected captures at =
multiple points (router and VPS side).
- In bad runs, client-side TCP packets are seen, but many =
segments are retransmitted repeatedly.
- Comparing capture points suggests segments often appear on =
VPS GRE side only much later / after retransmission, consistent with =
loss on the GRE path for this forwarded TCP case.
- In control runs (local router-generated traffic), packet =
timing/count alignment is normal.
What makes this puzzling
- Same router, same OS version, same general =
config:
- GRE to VPS0 =
works well.
- GRE to VPS2 =
fails for forwarded TCP TX only.
- PF disabled does not change behavior.
- Offloads disabled does not change =
behavior.
- No obvious drops in =
standard queue/interface counters.
Request
Any suggestions on what =
to inspect next for FreeBSD 14.3-p8 in this scenario:
- netisr / forwarding path diagnostics
- GRE-specific kernel instrumentation
- known igb/VLAN/GRE forwarding edge cases
- any tunables or debug knobs that could explain =E2=80=9Cforwa=
rded TCP TX collapse only on one GRE peer/path=E2=80=9D
Thanks.
Ivan
Just a guess but it =
sounds like you may have a MTU issue . What is the MTU settings on each =
of the real interfaces and the gre tunnels ?