Date: Tue, 13 Nov 2007 21:49:42 GMT From: Nikolay Govoruha <bardano@gmail.com> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/118026: [PATCH] Message-ID: <200711132149.lADLng0A063056@www.freebsd.org> Resent-Message-ID: <200711132200.lADM03KZ001142@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 118026 >Category: kern >Synopsis: [PATCH] >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Nov 13 22:00:03 UTC 2007 >Closed-Date: >Last-Modified: >Originator: Nikolay Govoruha >Release: FreeBSD 6.2 Release >Organization: VITAL >Environment: FreeBSD plant.vital.dp.ua 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Tue Nov 13 00:26:02 UTC 2007 root@plant.vital.dp.ua:/usr/src/sys/i386/compile/VITAL i386 >Description: It's a bug in the Path MTU Discovery technique - RFC1191 . When IPSEC option is turned on in the kernel configuration file the following behaviour is present. One host try to send an IP packet to other with size=1500 and DF (Do Not Fragment) bit set. Gateway - FreeBSD 6.2 Release - has a route for this packet with mtu=1408. net.inet.tcp.path_mtu_discovery: 1. Gateway can not transmit the packet to another gateway in this case. As an answer, Gateway sends an icmp packet to sender with type = ICMP_UNREACH (0x03) and code = ICMP_UNREACH_NEEDFRAG (0x04). But! Gateway does not set the mtu field in the packet. This field = 0x0000. tcpdump: //***************************************************************************** pvs# tcpdump -i rl1 -vv -x icmp tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes 09:42:39.379247 IP (tos 0x0, ttl 63, id 23385, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36 IP (tos 0x0, ttl 126, id 60516, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c1 (->2ff3)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp] 0x0000: 4500 0038 5b59 4000 3f01 05a0 505d 761e 0x0010: 0a00 0a51 0304 9eab 0000 0000 4500 05d4 0x0020: ec64 4000 7e06 34c1 0a00 0a51 505d 761e 0x0030: 0669 152d 97bf a62c 09:42:39.379644 IP (tos 0x0, ttl 63, id 23386, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36 IP (tos 0x0, ttl 126, id 60517, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c0 (->2ff2)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp] 0x0000: 4500 0038 5b5a 4000 3f01 059f 505d 761e 0x0010: 0a00 0a51 0304 98ff 0000 0000 4500 05d4 0x0020: ec65 4000 7e06 34c0 0a00 0a51 505d 761e 0x0030: 0669 152d 97bf abd8 //***************************************************************************** >How-To-Repeat: Try to use FTP connection for file transfer and see tcpdump - field "next hop mtu" - RFC1191. >Fix: I made the following patch to sys/netinet/ip_input.c and rebuild the kernel. Original - Line 1948: //***************************************************************************** case EMSGSIZE: type = ICMP_UNREACH; code = ICMP_UNREACH_NEEDFRAG; #if defined(IPSEC) || defined(FAST_IPSEC) /* * If the packet is routed over IPsec tunnel, tell the * originator the tunnel MTU. * tunnel MTU = if MTU - sizeof(IP) - ESP/AH hdrsiz * XXX quickhack!!! */ { struct secpolicy *sp = NULL; int ipsecerror; int ipsechdr; struct route *ro; #ifdef IPSEC sp = ipsec4_getpolicybyaddr(mcopy, IPSEC_DIR_OUTBOUND, IP_FORWARDING, &ipsecerror); #else /* FAST_IPSEC */ sp = ipsec_getpolicybyaddr(mcopy, IPSEC_DIR_OUTBOUND, IP_FORWARDING, &ipsecerror); #endif if (sp != NULL) { /* count IPsec header size */ ipsechdr = ipsec4_hdrsiz(mcopy, IPSEC_DIR_OUTBOUND, NULL); /* * find the correct route for outer IPv4 * header, compute tunnel MTU. */ if (sp->req != NULL && sp->req->sav != NULL && sp->req->sav->sah != NULL) { ro = &sp->req->sav->sah->sa_route; if (ro->ro_rt && ro->ro_rt->rt_ifp) { mtu = ro->ro_rt->rt_rmx.rmx_mtu ? ro->ro_rt->rt_rmx.rmx_mtu : ro->ro_rt->rt_ifp->if_mtu; mtu -= ipsechdr; } } #ifdef IPSEC key_freesp(sp); #else /* FAST_IPSEC */ KEY_FREESP(&sp); #endif ipstat.ips_cantfrag++; break; } } #endif /*IPSEC || FAST_IPSEC*/ /* * If the MTU wasn't set before use the interface mtu or * fall back to the next smaller mtu step compared to the * current packet size. */ if (mtu == 0) { if (ia != NULL) mtu = ia->ia_ifp->if_mtu; else mtu = ip_next_mtu(ip->ip_len, 0); } ipstat.ips_cantfrag++; break; //***************************************************************************** I used the printf() function to debug the problem. In my kernel was defined IPSEC. In my case sp = ipsec4_getpolicybyaddr(......) returned non-NULL value. But sp->req was NULL. In this case "if (sp != NULL){}" statement is executed, but mtu do not calculated, mtu stays equal zeroo, and at the end of the "if (sp != NULL){}" statement "break;" statement is present. So mtu stays equal zeroo and after "switch (error)" statement zeroo get to the "mtu" field to the icmp packet. Is it a bug? I resolved this problem by the following way: //***************************************************************************** #ifdef IPSEC key_freesp(sp); #else /* FAST_IPSEC */ KEY_FREESP(&sp); #endif //ipstat.ips_cantfrag++; //break; } } #endif /*IPSEC || FAST_IPSEC*/ /* * If the MTU wasn't set before use the interface mtu or * fall back to the next smaller mtu step compared to the * current packet size. */ if (mtu == 0) { if (ia != NULL) mtu = ia->ia_ifp->if_mtu; else mtu = ip_next_mtu(ip->ip_len, 0); } ipstat.ips_cantfrag++; break; //***************************************************************************** By comment the "break" statement and previous statement. In this case If mtu stays equal zeroo the following code is executed - the code that always executed when IPSEC and FAST_IPSEC are not defined. The tcpdump result: //***************************************************************************** pvs# tcpdump -i rl1 -vv -x icmp tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes 12:13:48.471242 IP (tos 0x0, ttl 63, id 20521, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36 IP (tos 0x0, ttl 126, id 50667, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b32 (->5664)!) 10.0.10.81.1769 > 80.93.118.30.5421: tcp 1476 [bad hdr length 4 - too short, < 20] 0x0000: 4500 0038 5029 4000 3f01 10d0 505d 761e 0x0010: 0a00 0a51 0304 7408 0000 0580 4500 05dc 0x0020: c5eb 4000 7e06 5b32 0a00 0a51 505d 761e 0x0030: 06e9 152d 5af0 079f 12:13:48.471583 IP (tos 0x0, ttl 63, id 20522, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36 IP (tos 0x0, ttl 126, id 50668, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b31 (->5663)!) 10.0.10.81.1769 > 80.93.118.30.5421: [|tcp] 0x0000: 4500 0038 502a 4000 3f01 10cf 505d 761e 0x0010: 0a00 0a51 0304 6e54 0000 0580 4500 05dc 0x0020: c5ec 4000 7e06 5b31 0a00 0a51 505d 761e 0x0030: 06e9 152d 5af0 0d53 //***************************************************************************** Yo see "next hop mtu" field has correct value - 0x0580 = 1408 decimal. Tell please, is this patch correct? mailto:bardano@gmail.com P.S. "bad cksum 5b31 (->5663)!) " it's a packet after natd, may be I have some incorrect natd configuration. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200711132149.lADLng0A063056>