From owner-freebsd-bugs@FreeBSD.ORG Tue Nov 13 22:00:03 2007 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CA6AD16A46E for ; Tue, 13 Nov 2007 22:00:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id AF1F413C457 for ; Tue, 13 Nov 2007 22:00:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id lADM03rr001143 for ; Tue, 13 Nov 2007 22:00:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id lADM03KZ001142; Tue, 13 Nov 2007 22:00:03 GMT (envelope-from gnats) Resent-Date: Tue, 13 Nov 2007 22:00:03 GMT Resent-Message-Id: <200711132200.lADM03KZ001142@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Nikolay Govoruha Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70F6616A41A for ; Tue, 13 Nov 2007 21:50:12 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 66A6413C50E for ; Tue, 13 Nov 2007 21:50:12 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.1/8.14.1) with ESMTP id lADLnhVd063057 for ; Tue, 13 Nov 2007 21:49:43 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.1/8.14.1/Submit) id lADLng0A063056; Tue, 13 Nov 2007 21:49:42 GMT (envelope-from nobody) Message-Id: <200711132149.lADLng0A063056@www.freebsd.org> Date: Tue, 13 Nov 2007 21:49:42 GMT From: Nikolay Govoruha To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/118026: [PATCH] X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2007 22:00:03 -0000 >Number: 118026 >Category: kern >Synopsis: [PATCH] >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Nov 13 22:00:03 UTC 2007 >Closed-Date: >Last-Modified: >Originator: Nikolay Govoruha >Release: FreeBSD 6.2 Release >Organization: VITAL >Environment: FreeBSD plant.vital.dp.ua 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Tue Nov 13 00:26:02 UTC 2007 root@plant.vital.dp.ua:/usr/src/sys/i386/compile/VITAL i386 >Description: It's a bug in the Path MTU Discovery technique - RFC1191 . When IPSEC option is turned on in the kernel configuration file the following behaviour is present. One host try to send an IP packet to other with size=1500 and DF (Do Not Fragment) bit set. Gateway - FreeBSD 6.2 Release - has a route for this packet with mtu=1408. net.inet.tcp.path_mtu_discovery: 1. Gateway can not transmit the packet to another gateway in this case. As an answer, Gateway sends an icmp packet to sender with type = ICMP_UNREACH (0x03) and code = ICMP_UNREACH_NEEDFRAG (0x04). But! Gateway does not set the mtu field in the packet. This field = 0x0000. tcpdump: //***************************************************************************** pvs# tcpdump -i rl1 -vv -x icmp tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes 09:42:39.379247 IP (tos 0x0, ttl 63, id 23385, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36 IP (tos 0x0, ttl 126, id 60516, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c1 (->2ff3)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp] 0x0000: 4500 0038 5b59 4000 3f01 05a0 505d 761e 0x0010: 0a00 0a51 0304 9eab 0000 0000 4500 05d4 0x0020: ec64 4000 7e06 34c1 0a00 0a51 505d 761e 0x0030: 0669 152d 97bf a62c 09:42:39.379644 IP (tos 0x0, ttl 63, id 23386, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36 IP (tos 0x0, ttl 126, id 60517, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c0 (->2ff2)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp] 0x0000: 4500 0038 5b5a 4000 3f01 059f 505d 761e 0x0010: 0a00 0a51 0304 98ff 0000 0000 4500 05d4 0x0020: ec65 4000 7e06 34c0 0a00 0a51 505d 761e 0x0030: 0669 152d 97bf abd8 //***************************************************************************** >How-To-Repeat: Try to use FTP connection for file transfer and see tcpdump - field "next hop mtu" - RFC1191. >Fix: I made the following patch to sys/netinet/ip_input.c and rebuild the kernel. Original - Line 1948: //***************************************************************************** case EMSGSIZE: type = ICMP_UNREACH; code = ICMP_UNREACH_NEEDFRAG; #if defined(IPSEC) || defined(FAST_IPSEC) /* * If the packet is routed over IPsec tunnel, tell the * originator the tunnel MTU. * tunnel MTU = if MTU - sizeof(IP) - ESP/AH hdrsiz * XXX quickhack!!! */ { struct secpolicy *sp = NULL; int ipsecerror; int ipsechdr; struct route *ro; #ifdef IPSEC sp = ipsec4_getpolicybyaddr(mcopy, IPSEC_DIR_OUTBOUND, IP_FORWARDING, &ipsecerror); #else /* FAST_IPSEC */ sp = ipsec_getpolicybyaddr(mcopy, IPSEC_DIR_OUTBOUND, IP_FORWARDING, &ipsecerror); #endif if (sp != NULL) { /* count IPsec header size */ ipsechdr = ipsec4_hdrsiz(mcopy, IPSEC_DIR_OUTBOUND, NULL); /* * find the correct route for outer IPv4 * header, compute tunnel MTU. */ if (sp->req != NULL && sp->req->sav != NULL && sp->req->sav->sah != NULL) { ro = &sp->req->sav->sah->sa_route; if (ro->ro_rt && ro->ro_rt->rt_ifp) { mtu = ro->ro_rt->rt_rmx.rmx_mtu ? ro->ro_rt->rt_rmx.rmx_mtu : ro->ro_rt->rt_ifp->if_mtu; mtu -= ipsechdr; } } #ifdef IPSEC key_freesp(sp); #else /* FAST_IPSEC */ KEY_FREESP(&sp); #endif ipstat.ips_cantfrag++; break; } } #endif /*IPSEC || FAST_IPSEC*/ /* * If the MTU wasn't set before use the interface mtu or * fall back to the next smaller mtu step compared to the * current packet size. */ if (mtu == 0) { if (ia != NULL) mtu = ia->ia_ifp->if_mtu; else mtu = ip_next_mtu(ip->ip_len, 0); } ipstat.ips_cantfrag++; break; //***************************************************************************** I used the printf() function to debug the problem. In my kernel was defined IPSEC. In my case sp = ipsec4_getpolicybyaddr(......) returned non-NULL value. But sp->req was NULL. In this case "if (sp != NULL){}" statement is executed, but mtu do not calculated, mtu stays equal zeroo, and at the end of the "if (sp != NULL){}" statement "break;" statement is present. So mtu stays equal zeroo and after "switch (error)" statement zeroo get to the "mtu" field to the icmp packet. Is it a bug? I resolved this problem by the following way: //***************************************************************************** #ifdef IPSEC key_freesp(sp); #else /* FAST_IPSEC */ KEY_FREESP(&sp); #endif //ipstat.ips_cantfrag++; //break; } } #endif /*IPSEC || FAST_IPSEC*/ /* * If the MTU wasn't set before use the interface mtu or * fall back to the next smaller mtu step compared to the * current packet size. */ if (mtu == 0) { if (ia != NULL) mtu = ia->ia_ifp->if_mtu; else mtu = ip_next_mtu(ip->ip_len, 0); } ipstat.ips_cantfrag++; break; //***************************************************************************** By comment the "break" statement and previous statement. In this case If mtu stays equal zeroo the following code is executed - the code that always executed when IPSEC and FAST_IPSEC are not defined. The tcpdump result: //***************************************************************************** pvs# tcpdump -i rl1 -vv -x icmp tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes 12:13:48.471242 IP (tos 0x0, ttl 63, id 20521, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36 IP (tos 0x0, ttl 126, id 50667, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b32 (->5664)!) 10.0.10.81.1769 > 80.93.118.30.5421: tcp 1476 [bad hdr length 4 - too short, < 20] 0x0000: 4500 0038 5029 4000 3f01 10d0 505d 761e 0x0010: 0a00 0a51 0304 7408 0000 0580 4500 05dc 0x0020: c5eb 4000 7e06 5b32 0a00 0a51 505d 761e 0x0030: 06e9 152d 5af0 079f 12:13:48.471583 IP (tos 0x0, ttl 63, id 20522, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36 IP (tos 0x0, ttl 126, id 50668, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b31 (->5663)!) 10.0.10.81.1769 > 80.93.118.30.5421: [|tcp] 0x0000: 4500 0038 502a 4000 3f01 10cf 505d 761e 0x0010: 0a00 0a51 0304 6e54 0000 0580 4500 05dc 0x0020: c5ec 4000 7e06 5b31 0a00 0a51 505d 761e 0x0030: 06e9 152d 5af0 0d53 //***************************************************************************** Yo see "next hop mtu" field has correct value - 0x0580 = 1408 decimal. Tell please, is this patch correct? mailto:bardano@gmail.com P.S. "bad cksum 5b31 (->5663)!) " it's a packet after natd, may be I have some incorrect natd configuration. >Release-Note: >Audit-Trail: >Unformatted: