From owner-freebsd-net@FreeBSD.ORG Sat Mar 31 12:25:50 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64F12106566B for ; Sat, 31 Mar 2012 12:25:50 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id AB3498FC16 for ; Sat, 31 Mar 2012 12:25:49 +0000 (UTC) Received: (qmail 59066 invoked from network); 31 Mar 2012 12:24:34 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 31 Mar 2012 12:24:34 -0000 Message-ID: <4F76F7C2.9080404@freebsd.org> Date: Sat, 31 Mar 2012 14:25:38 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jason Wolfe References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: Possible interoperability issue with TCP timestamps between FreeBSD and Linux X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Mar 2012 12:25:50 -0000 On 31.03.2012 00:07, Jason Wolfe wrote: > So I'm seeing an issue that appears to be caused by the strict TCP > timestamp adherence in the Linux kernel when there are connections > being initiated by both sides of a Linux and FreeBSD pair of servers. > What is looks like is FreeBSD is only tracking the timestamp for the > remote host for connections it initiates, while the Linux box is > tracking it globally for any transactions. As you can see below, this > discrepancy seems to be causing Linux to ignore SYN packets when > FreeBSD is attempting to connect after what they believe should be the > proper timestamp has desyncd. You'll see after 3 SYN failures, > FreeBSD drops timestamp support and the connection goes through > properly, so this is usually causing about a 10s delay on the > handshake. > > Tail end of a valid/successful transfer initiated by FreeBSD > 8.2-RELEASE to Linux. This box in particular is 2.6.32, but we've > seen it on other kernels: > > 18:09:22.271473 IP FreeBSD.56757> Linux.80: Flags [.], > ack 322, win 12559, options [nop,nop,TS val 827470242 ecr 1108508974], > length 0 > 18:09:22.271615 IP FreeBSD.56757> Linux.80: Flags > [F.], seq 247, ack 322, win 12579, options [nop,nop,TS val 827470242 > ecr 1108508974], length 0 > 18:09:22.271888 IP FreeBSD.56757> Linux.80: Flags [.], > ack 323, win 12579, options [nop,nop,TS val 827470242 ecr 1108508974], > length 0 > 18:09:22.980495 IP FreeBSD.56777> Linux.80: Flags [S], > seq 1706839142, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS > val 827470951 ecr 0], length 0 > 18:09:22.980665 IP FreeBSD.56777> Linux.80: Flags [.], > ack 3814836667, win 12579, options [nop,nop,TS val 827470951 ecr > 1108509044], length 0 > 18:09:22.980711 IP FreeBSD.56777> Linux.80: Flags > [P.], ack 1, win 12579, options [nop,nop,TS val 827470951 ecr > 1108509044], length 247 > 18:09:23.001063 IP FreeBSD.56777> Linux.80: Flags [.], > ack 322, win 12559, options [nop,nop,TS val 827470971 ecr 1108509047], > length 0 > 18:09:23.001226 IP FreeBSD.56777> Linux.80: Flags > [F.], seq 247, ack 322, win 12579, options [nop,nop,TS val 827470971 > ecr 1108509047], length 0 > 18:09:23.001541 IP FreeBSD.56777> Linux.80: Flags [.], > ack 323, win 12579, options [nop,nop,TS val 827470972 ecr 1108509047], > length 0 > > Here's a request initiated by Linux to FreeBSD. Note how > the timestamp value is a lot higher. > > 18:09:26.513392 IP FreeBSD.1983> Linux.24747: Flags > [S.], seq 2239886569, ack 3865964318, win 65535, options [mss > 1460,nop,wscale 4,sackOK,TS val 1857101612 ecr 1108509398], length 0 > 18:09:26.513680 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 1, win 12579, options [nop,nop,TS val 1857101612 ecr > 1108509398], length 0 > 18:09:26.513692 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12540, options [nop,nop,TS val 1857101612 ecr > 1108509398], length 0 > 18:09:26.559776 IP FreeBSD.1983> Linux.24747: Flags > [P.], ack 622, win 12579, options [nop,nop,TS val 1857101658 ecr > 1108509398], length 285 > 18:09:26.568579 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509402], length 1448 > 18:09:26.568584 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509402], length 1448 > 18:09:26.568587 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509402], length 1448 > 18:09:26.568591 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509402], length 1448 > 18:09:26.568843 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509403], length 1448 > 18:09:26.568847 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509403], length 1448 > 18:09:26.568850 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509403], length 1448 > 18:09:26.568854 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509403], length 1448 > 18:09:26.568861 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509403], length 1448 > 18:09:26.568864 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509403], length 1448 > 18:09:26.568867 IP FreeBSD.1983> Linux.24747: Flags > [P.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509403], length 954 > 18:09:26.569222 IP FreeBSD.1983> Linux.24747: Flags > [.], ack 623, win 12579, options [nop,nop,TS val 1857101667 ecr > 1108509403], length 0 > 18:09:26.569235 IP FreeBSD.1983> Linux.24747: Flags > [F.], seq 15720, ack 623, win 12579, options [nop,nop,TS val > 1857101667 ecr 1108509403], length 0 > > Then, we make another connection initiated by FreeBSD, but with a > sequence number that > logically follows from the original outbound connection. This fails, > until eventually we automatically give up on timestamps. > > 18:09:34.631349 IP FreeBSD.57007> Linux.80: Flags [S], > seq 704145119, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS val > 827482602 ecr 0], length 0 > 18:09:37.360671 IP FreeBSD.57056> Linux.80: Flags [S], > seq 2653481758, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS > val 827485331 ecr 0], length 0 > 18:09:37.631176 IP FreeBSD.57007> Linux.80: Flags [S], > seq 704145119, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS val > 827485602 ecr 0], length 0 > 18:09:40.360134 IP FreeBSD.57056> Linux.80: Flags [S], > seq 2653481758, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS > val 827488331 ecr 0], length 0 > 18:09:40.831125 IP FreeBSD.57007> Linux.80: Flags [S], > seq 704145119, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS val > 827488802 ecr 0], length 0 > 18:09:43.560084 IP FreeBSD.57056> Linux.80: Flags [S], > seq 2653481758, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS > val 827491531 ecr 0], length 0 > 18:09:44.031076 IP FreeBSD.57007> Linux.80: Flags [S], > seq 704145119, win 65535, options [mss 1460,sackOK,eol], length 0 > 18:09:44.031280 IP FreeBSD.57007> Linux.80: Flags [.], > ack 4142946180, win 65535, length 0 > > We confirmed this further by disabled TCP timestamps on the Linux box > via the 'net.ipv4.tcp_timestamps' sysctl, and the problem cleared up. > Has any one else seen this? Is there a more graceful work around > aside from disabling timestamp support in Linux, or is there a sysctl > I'm missing in FreeBSD that causes the timestamp to be global for each > peer? Linux does the per-host timestamp check only when you hit a 2MSL graveyard connection. It happens in net/ipv4/tcp_ipv4.c around line 1376 within a PAWS check. Our in- and outbound timestamps are not synchronized to the same base. A timestamp on an SYN/ACK is somewhat randomized. Whereas a timestamp in a SYN is based off ticks and increases with later connections. It seems you have a large number of short lived connections between these hosts. You may want to increase the port range for outgoing connections to avoid recycling port numbers within the 2MSL time frame. Alternatively you could decrease the 2MSL time on Linux. Or you can turn off syncookies on FreeBSD to avoid the SYN/ACK timestamp randomization. Linux has over-engineered this whole thing for little to no benefit. This additional check presumably also wrecks busy NAT IP addresses where port recycling is fast. -- Andre