Date: Wed, 15 Nov 2006 10:04:45 +0000 From: Tom Judge <tom@tomjudge.com> To: freebsd-hackers@freebsd.org Cc: jthomson@mintel.com Subject: [patch] Path MTU Discovery when routing over IPSec connections Message-ID: <455AE63D.3080501@tomjudge.com>
next in thread | raw e-mail | index | archive | help
I have been looking into some problems with PMTU Discovery when routing packets over IPSec (gif) tunnels, I have submitted the details to the open PR kern/91412 but have had no response as to whether my patch is the correct solution to the problem. The problem occurs when sys/netinet/ip_input.c constructs the ICMP Host Unreachable message with an MTU hint. Triggered when a packet that is to be routed over the IPSec link is larger than the MTU on the link and has the Don't Fragment bit set. There is a block of code that is specific to IPSec (gif) MTU discovery which attempts to calculate the MTU size of the link by working out the size of the IPSec header and subtracting this from the MTU of the transmission interface and the gif IP header. However the code fails to retrieve a fully populated security policy which means that the code block designed to calculate the MTU never gets run. The code then breaks out of the case statement and transmits the ICMP packet with the MTU hint set to 0. If the break is removed from IPSec code the MTU calculation is carried out by the non IPSec code successfully. This begs the question whether the IPSec code is even needed as the normal code works fine? Network Layout: Box 1 --------- Router 2 --(Ipsec tunnel)-- Router 3 --(lan) --- Box 2 |(lan) |------ Router 1 Box 1: FreeBSD 5.4 Router [123]: FreeBSD 6.1 Box 2: Linux 2.6 Tests to reproduce the error: PING Test from box 1 to box 2 with do not fragment set and a packet larger than the path MTU: box1# ping -s 1280 -D box2 PING box2 (10.0.0.79): 1280 data bytes 36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 051c b454 0 0000 40 01 c9fc 172.17.1.48 10.0.0.79 36 bytes from router2 (172.17.3.6): frag needed and DF set (MTU 0) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 1c05 b454 0 0000 3f 01 cafc 172.17.1.48 10.0.0.79 36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 051c b45f 0 0000 40 01 c9f1 172.17.1.48 10.0.0.79 36 bytes from router2 (172.17.3.6): frag needed and DF set (MTU 0) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 1c05 b45f 0 0000 3f 01 caf1 172.17.1.48 10.0.0.79 ^C --- box2 ping statistics --- 2 packets transmitted, 0 packets received, 100% packet loss PING Test from box 1 to box 2 with do not fragment set and a packet smaller than the path MTU: box1# ping -s 1200 -D box2 PING box2 (10.0.0.79): 1200 data bytes 36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 04cc b472 0 0000 40 01 ca2e 172.17.1.48 10.0.0.79 1208 bytes from 10.0.0.79: icmp_seq=0 ttl=61 time=111.017 ms 36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 04cc b479 0 0000 40 01 ca27 172.17.1.48 10.0.0.79 1208 bytes from 10.0.0.79: icmp_seq=1 ttl=61 time=110.419 ms ^C --- box2 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 110.419/110.718/111.017/0.299 ms box1# Relevent interface configuration on box1 (from ifconfig): em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=b<RXCSUM,TXCSUM,VLAN_MTU> inet 172.17.1.48 netmask 0xffff0000 broadcast 172.17.255.255 ether 00:0f:1f:fa:d1:b5 media: Ethernet autoselect (1000baseTX <full-duplex>) status: active Relevent interface configuration on router2 (from ifconfig): em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 options=b<RXCSUM,TXCSUM,VLAN_MTU> inet 172.17.3.6 netmask 0xffff0000 broadcast 172.17.255.255 ether 00:c0:9f:12:13:1b media: Ethernet autoselect (1000baseTX <full-duplex>) status: active gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280 tunnel inet 63.174.xxx.xxx --> 82.195.xxx.xxx inet 192.168.174.10 --> 192.168.174.9 netmask 0xfffffffc Patch: Index: sys/netinet/ip_input.c =================================================================== --- sys/netinet/ip_input.c (revision 24) +++ sys/netinet/ip_input.c (working copy) @@ -1990,8 +1990,8 @@ #else /* FAST_IPSEC */ KEY_FREESP(&sp); #endif - ipstat.ips_cantfrag++; - break; +// ipstat.ips_cantfrag++; +// break; } } #endif /*IPSEC || FAST_IPSEC*/ Tests after the patch has been applied: PING Test from box 1 to box 2 with do not fragment set and a packet larger than the path MTU: box1# ping -s 1280 -D box2 PING box2 (10.0.0.79): 1280 data bytes 36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 051c b454 0 0000 40 01 c9fc 172.17.1.48 10.0.0.79 36 bytes from router2 (172.17.3.6): frag needed and DF set (MTU 1280) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 1c05 b454 0 0000 3f 01 cafc 172.17.1.48 10.0.0.79 36 bytes from router1 (172.17.3.5): Redirect Host(New addr: 172.17.3.6) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 051c b45f 0 0000 40 01 c9f1 172.17.1.48 10.0.0.79 36 bytes from router2 (172.17.3.6): frag needed and DF set (MTU 1280) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 1c05 b45f 0 0000 3f 01 caf1 172.17.1.48 10.0.0.79 Any comments suggestions on this would be greatly appreciated. Tom J
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?455AE63D.3080501>