From owner-freebsd-hackers Wed Aug 14 10: 3: 0 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C2A9437B400; Wed, 14 Aug 2002 10:02:46 -0700 (PDT) Received: from csmail.commserv.ucsb.edu (cspdc.commserv.ucsb.edu [128.111.251.12]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4811743E65; Wed, 14 Aug 2002 10:02:46 -0700 (PDT) (envelope-from steve@expertcity.com) Received: from expertcity.com ([68.6.35.15]) by csmail.commserv.ucsb.edu (Netscape Messaging Server 3.62) with ESMTP id 521; Wed, 14 Aug 2002 10:02:44 -0700 Message-ID: <3D5A8D68.AE7B43A5@expertcity.com> Date: Wed, 14 Aug 2002 10:03:36 -0700 From: Steve Francis X-Mailer: Mozilla 4.79 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-hackers@freebsd.org, freebsd-net@freebsd.org Subject: pmtu-d broken Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I find this hard to believe, but it seem PMTU-D is broken up to and including 4.6.1-RELEASE-p10 (the latest I've tried. Also tried 4.4) The behaviour of FreeBSD is such that when it sends a too large packet, and receives a fragmentation required-DF bit set ICMP, it does not honor it for the packet that caused the ICMP. It does correctly put the new MTU in its cloned route table, and does correctly send future packets in segment of size < the mtu, but it keeps retransmitting the packet that caused the ICMP in the original, too big size, so it never makes it, and just keeps generating more ICMPs. tcpdump examples: First, note that there is no specific entry for 10.4.0.80 dell350-12# netstat -anlr | grep 10.4 10.4.1.55 63.251.224.129 UGHW 1 990 1500 fxp0 10.4.1.58 63.251.224.129 UGHW 7 478339 1420 fxp0 10.4.1.233 63.251.224.129 UGHW3 0 2735 1420 fxp0 dell350-12# From 10.4.0.80, which is on the other side of a VPN tunnel with MTU of 1420 bytes, I do wget: Note that despite the ICMP messages telling it fragmentation is required, the freeBSD box keeps sending 1500 byte packets with DF set. dell350-12# tcpdump -vvi fxp0 host wonko.corp or host 10.16.5.8 tcpdump: listening on fxp0 09:40:25.938609 10.4.0.80.2793 > dell350-12.snv.http: S [tcp sum ok] 3671603378: 3671603378(0) win 16384 ( DF) (ttl 61, id 35804, len 60) 09:40:25.938665 dell350-12.snv.http > 10.4.0.80.2793: S [tcp sum ok] 3749220980: 3749220980(0) ack 3671603379 win 17376 (DF) (ttl 64, id 43056, len 60) 09:40:25.960106 10.4.0.80.2793 > dell350-12.snv.http: . [tcp sum ok] 1:1(0) ack 1 win 17376 (DF) (ttl 61, id 35806, len 52) 09:40:25.961626 10.4.0.80.2793 > dell350-12.snv.http: P 1:147(146) ack 1 win 173 76 (DF) (ttl 61, id 35823, len 198) 09:40:25.961647 dell350-12.snv.http > 10.4.0.80.2793: . [tcp sum ok] 1:1(0) ack 147 win 17230 (DF) (ttl 64, id 43078, l en 52) 09:40:25.962318 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack 147 win 17376 (DF) (ttl 64, id 43079, len 1500 ) 09:40:25.962337 dell350-12.snv.http > 10.4.0.80.2793: . 1449:2897(1448) ack 147 win 17376 (DF) (ttl 64, id 43080, len 1 500) 09:40:25.962352 dell350-12.snv.http > 10.4.0.80.2793: . 2897:4345(1448) ack 147 win 17376 (DF) (ttl 64, id 43081, len 1 500) 09:40:25.963573 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 43079, len 1500) (ttl 254, id 16874, len 56) 09:40:25.963696 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 43080, len 1500) (ttl 254, id 16875, len 56) 09:40:25.963826 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 43081, len 1500) (ttl 254, id 16876, len 56) 09:40:26.953112 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack 147 win 17376 (DF) (ttl 64, id 43456, len 1500 ) 09:40:26.954116 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 43456, len 1500) (ttl 254, id 17435, len 56) 09:40:28.953114 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack 147 win 17376 (DF) (ttl 64, id 44025, len 1500 ) 09:40:28.954114 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 44025, len 1500) (ttl 254, id 18997, len 56) 09:40:32.953061 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack 147 win 17376 (DF) (ttl 64, id 45089, len 1500 ) 09:40:32.954454 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable - need t o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF) (ttl 62, id 45089, len 1500) (ttl 254, id 22545, len 56) ^C However, we do see that it correcltly updated the mtu: dell350-12# netstat -anlr | grep 10.4 10.4.0.80 63.251.224.129 UGHW 1 2736 1420 fxp0 10.4.1.55 63.251.224.129 UGHW 1 990 1500 fxp0 10.4.1.58 63.251.224.129 UGHW 7 478423 1420 fxp0 dell350-12# And a repeated wget works fine: dell350-12# tcpdump -vvi fxp0 host wonko.corp or host 10.16.5.8 tcpdump: listening on fxp0 09:40:59.134496 10.4.0.80.1979 > dell350-12.snv.http: S [tcp sum ok] 926911706:9 26911706(0) win 16384 (DF ) (ttl 61, id 38108, len 60) 09:40:59.134532 dell350-12.snv.http > 10.4.0.80.1979: S [tcp sum ok] 422570166:4 22570166(0) ack 926911707 win 16416 (DF) (ttl 64, id 53036, len 60) 09:40:59.156451 10.4.0.80.1979 > dell350-12.snv.http: . [tcp sum ok] 1:1(0) ack 1 win 17376 (DF) (ttl 61, id 38170, len 52) 09:40:59.156820 10.4.0.80.1979 > dell350-12.snv.http: P 1:147(146) ack 1 win 173 76 (DF) (ttl 61, id 38171, len 198) 09:40:59.156842 dell350-12.snv.http > 10.4.0.80.1979: . [tcp sum ok] 1:1(0) ack 147 win 16270 (DF) (ttl 64, id 53037, l en 52) 09:40:59.157780 dell350-12.snv.http > 10.4.0.80.1979: . 1:1369(1368) ack 147 win 16416 (DF) (ttl 64, id 53038, len 1420 ) 09:40:59.157799 dell350-12.snv.http > 10.4.0.80.1979: . 1369:2737(1368) ack 147 win 16416 (DF) (ttl 64, id 53039, len 1 420) 09:40:59.157816 dell350-12.snv.http > 10.4.0.80.1979: . 2737:4105(1368) ack 147 win 16416 (DF) (ttl 64, id 53040, len 1 420) sending nothing larger than 1420 bytes. So freeBSD is behaving in a broken way that violates the RFC, unless I'm much mistaken. Unfortunately, I am not a coder, so cant go poking at source to verify or fix this. (Well, it would take me a very long time.) Anyone care to confirm (and ideally, fix)? I can replicate this at will, so can easily gather more data if people want. TIA To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message