Date: Wed, 14 Aug 2002 10:03:36 -0700 From: Steve Francis <steve@expertcity.com> To: freebsd-hackers@freebsd.org, freebsd-net@freebsd.org Subject: pmtu-d broken Message-ID: <3D5A8D68.AE7B43A5@expertcity.com>
next in thread | raw e-mail | index | archive | help
I find this hard to believe, but it seem PMTU-D is broken up to and including 4.6.1-RELEASE-p10 (the latest I've tried. Also tried 4.4) The behaviour of FreeBSD is such that when it sends a too large packet, and receives a fragmentation required-DF bit set ICMP, it does not honor it for the packet that caused the ICMP. It does correctly put the new MTU in its cloned route table, and does correctly send future packets in segment of size < the mtu, but it keeps retransmitting the packet that caused the ICMP in the original, too big size, so it never makes it, and just keeps generating more ICMPs. tcpdump examples: First, note that there is no specific entry for 10.4.0.80 dell350-12# netstat -anlr | grep 10.4 10.4.1.55 63.251.224.129 UGHW 1 990 1500 fxp0 10.4.1.58 63.251.224.129 UGHW 7 478339 1420 fxp0 10.4.1.233 63.251.224.129 UGHW3 0 2735 1420 fxp0 dell350-12# From 10.4.0.80, which is on the other side of a VPN tunnel with MTU of 1420 bytes, I do wget: Note that despite the ICMP messages telling it fragmentation is required, the freeBSD box keeps sending 1500 byte packets with DF set. dell350-12# tcpdump -vvi fxp0 host wonko.corp or host 10.16.5.8 tcpdump: listening on fxp0 09:40:25.938609 10.4.0.80.2793 > dell350-12.snv.http: S [tcp sum ok] 3671603378: 3671603378(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 1076950329 0> ( DF) (ttl 61, id 35804, len 60) 09:40:25.938665 dell350-12.snv.http > 10.4.0.80.2793: S [tcp sum ok] 3749220980: 3749220980(0) ack 3671603379 win 17376 <mss 1460,nop,wscale 0,nop,nop,timestamp 102689414 1076950329> (DF) (ttl 64, id 43056, len 60) 09:40:25.960106 10.4.0.80.2793 > dell350-12.snv.http: . [tcp sum ok] 1:1(0) ack 1 win 17376 <nop,nop,timestamp 1076950332 102689414> (DF) (ttl 61, id 35806, len 52) 09:40:25.961626 10.4.0.80.2793 > dell350-12.snv.http: P 1:147(146) ack 1 win 173 76 <nop,nop,timestamp 1076950332 102689414> (DF) (ttl 61, id 35823, len 198) 09:40:25.961647 dell350-12.snv.http > 10.4.0.80.2793: . [tcp sum ok] 1:1(0) ack 147 win 17230 <nop,nop,timestamp 102689416 1076950332> (DF) (ttl 64, id 43078, l en 52) 09:40:25.962318 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack 147 win 17376 <nop,nop,timestamp 102689416 1076950332> (DF) (ttl 64, id 43079, len 1500 ) 09:40:25.962337 dell350-12.snv.http > 10.4.0.80.2793: . 1449:2897(1448) ack 147 win 17376 <nop,nop,timestamp 102689416 1076950332> (DF) (ttl 64, id 43080, len 1 500) 09:40:25.962352 dell350-12.snv.http > 10.4.0.80.2793: . 2897:4345(1448) ack 147 win 17376 <nop,nop,timestamp 102689416 1076950332> (DF) (ttl 64, id 43081, len 1 500) 09:40:25.963573 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 43079, len 1500) (ttl 254, id 16874, len 56) 09:40:25.963696 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 43080, len 1500) (ttl 254, id 16875, len 56) 09:40:25.963826 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 43081, len 1500) (ttl 254, id 16876, len 56) 09:40:26.953112 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack 147 win 17376 <nop,nop,timestamp 102689516 1076950332> (DF) (ttl 64, id 43456, len 1500 ) 09:40:26.954116 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 43456, len 1500) (ttl 254, id 17435, len 56) 09:40:28.953114 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack 147 win 17376 <nop,nop,timestamp 102689716 1076950332> (DF) (ttl 64, id 44025, len 1500 ) 09:40:28.954114 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 44025, len 1500) (ttl 254, id 18997, len 56) 09:40:32.953061 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack 147 win 17376 <nop,nop,timestamp 102690116 1076950332> (DF) (ttl 64, id 45089, len 1500 ) 09:40:32.954454 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 45089, len 1500) (ttl 254, id 22545, len 56) ^C However, we do see that it correcltly updated the mtu: dell350-12# netstat -anlr | grep 10.4 10.4.0.80 63.251.224.129 UGHW 1 2736 1420 fxp0 10.4.1.55 63.251.224.129 UGHW 1 990 1500 fxp0 10.4.1.58 63.251.224.129 UGHW 7 478423 1420 fxp0 dell350-12# And a repeated wget works fine: dell350-12# tcpdump -vvi fxp0 host wonko.corp or host 10.16.5.8 tcpdump: listening on fxp0 09:40:59.134496 10.4.0.80.1979 > dell350-12.snv.http: S [tcp sum ok] 926911706:9 26911706(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 1076953649 0> (DF ) (ttl 61, id 38108, len 60) 09:40:59.134532 dell350-12.snv.http > 10.4.0.80.1979: S [tcp sum ok] 422570166:4 22570166(0) ack 926911707 win 16416 <mss 1460,nop,wscale 0,nop,nop,timestamp 102 692734 1076953649> (DF) (ttl 64, id 53036, len 60) 09:40:59.156451 10.4.0.80.1979 > dell350-12.snv.http: . [tcp sum ok] 1:1(0) ack 1 win 17376 <nop,nop,timestamp 1076953651 102692734> (DF) (ttl 61, id 38170, len 52) 09:40:59.156820 10.4.0.80.1979 > dell350-12.snv.http: P 1:147(146) ack 1 win 173 76 <nop,nop,timestamp 1076953651 102692734> (DF) (ttl 61, id 38171, len 198) 09:40:59.156842 dell350-12.snv.http > 10.4.0.80.1979: . [tcp sum ok] 1:1(0) ack 147 win 16270 <nop,nop,timestamp 102692736 1076953651> (DF) (ttl 64, id 53037, l en 52) 09:40:59.157780 dell350-12.snv.http > 10.4.0.80.1979: . 1:1369(1368) ack 147 win 16416 <nop,nop,timestamp 102692736 1076953651> (DF) (ttl 64, id 53038, len 1420 ) 09:40:59.157799 dell350-12.snv.http > 10.4.0.80.1979: . 1369:2737(1368) ack 147 win 16416 <nop,nop,timestamp 102692736 1076953651> (DF) (ttl 64, id 53039, len 1 420) 09:40:59.157816 dell350-12.snv.http > 10.4.0.80.1979: . 2737:4105(1368) ack 147 win 16416 <nop,nop,timestamp 102692736 1076953651> (DF) (ttl 64, id 53040, len 1 420) sending nothing larger than 1420 bytes. So freeBSD is behaving in a broken way that violates the RFC, unless I'm much mistaken. Unfortunately, I am not a coder, so cant go poking at source to verify or fix this. (Well, it would take me a very long time.) Anyone care to confirm (and ideally, fix)? I can replicate this at will, so can easily gather more data if people want. TIA To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D5A8D68.AE7B43A5>