Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 31 Mar 2012 13:16:52 -0700
From:      Jason Wolfe <nitroboost@gmail.com>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Possible interoperability issue with TCP timestamps between FreeBSD and Linux
Message-ID:  <CAAAm0r2KAX2H_nVdixur%2BTSJV03Yp9ToUMdZfVpf0Ap5ps-CYQ@mail.gmail.com>
In-Reply-To: <4F76F7C2.9080404@freebsd.org>
References:  <CAAAm0r2kRt3nyOSmhpT=uQH129LA6p5DgSy3GShBkQmbMEBxhg@mail.gmail.com> <4F76F7C2.9080404@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 31, 2012 at 5:25 AM, Andre Oppermann <andre@freebsd.org> wrote:
> On 31.03.2012 00:07, Jason Wolfe wrote:
>>
>> So I'm seeing an issue that appears to be caused by the strict TCP
>> timestamp adherence in the Linux kernel when there are connections
>> being initiated by both sides of a Linux and FreeBSD pair of servers.
>> What is looks like is FreeBSD is only tracking the timestamp for the
>> remote host for connections it initiates, while the Linux box is
>> tracking it globally for any transactions. =A0As you can see below, this
>> discrepancy seems to be causing Linux to ignore SYN packets when
>> FreeBSD is attempting to connect after what they believe should be the
>> proper timestamp has desyncd. =A0You'll see after 3 SYN failures,
>> FreeBSD drops timestamp support and the connection goes through
>> properly, so this is usually causing about a 10s delay on the
>> handshake.
>>
>> Tail end of a valid/successful transfer initiated by FreeBSD
>> 8.2-RELEASE to Linux. =A0This box in particular is 2.6.32, but we've
>> seen it on other kernels:
>>
>> 18:09:22.271473 IP FreeBSD.56757> =A0Linux.80: Flags [.],
>> ack 322, win 12559, options [nop,nop,TS val 827470242 ecr 1108508974],
>> length 0
>> 18:09:22.271615 IP FreeBSD.56757> =A0Linux.80: Flags
>> [F.], seq 247, ack 322, win 12579, options [nop,nop,TS val 827470242
>> ecr 1108508974], length 0
>> 18:09:22.271888 IP FreeBSD.56757> =A0Linux.80: Flags [.],
>> ack 323, win 12579, options [nop,nop,TS val 827470242 ecr 1108508974],
>> length 0
>> 18:09:22.980495 IP FreeBSD.56777> =A0Linux.80: Flags [S],
>> seq 1706839142, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS
>> val 827470951 ecr 0], length 0
>> 18:09:22.980665 IP FreeBSD.56777> =A0Linux.80: Flags [.],
>> ack 3814836667, win 12579, options [nop,nop,TS val 827470951 ecr
>> 1108509044], length 0
>> 18:09:22.980711 IP FreeBSD.56777> =A0Linux.80: Flags
>> [P.], ack 1, win 12579, options [nop,nop,TS val 827470951 ecr
>> 1108509044], length 247
>> 18:09:23.001063 IP FreeBSD.56777> =A0Linux.80: Flags [.],
>> ack 322, win 12559, options [nop,nop,TS val 827470971 ecr 1108509047],
>> length 0
>> 18:09:23.001226 IP FreeBSD.56777> =A0Linux.80: Flags
>> [F.], seq 247, ack 322, win 12579, options [nop,nop,TS val 827470971
>> ecr 1108509047], length 0
>> 18:09:23.001541 IP FreeBSD.56777> =A0Linux.80: Flags [.],
>> ack 323, win 12579, options [nop,nop,TS val 827470972 ecr 1108509047],
>> length 0
>>
>> Here's a request initiated by Linux to FreeBSD. =A0Note how
>> the timestamp value is a lot higher.
>>
>> 18:09:26.513392 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [S.], seq 2239886569, ack 3865964318, win 65535, options [mss
>> 1460,nop,wscale 4,sackOK,TS val 1857101612 ecr 1108509398], length 0
>> 18:09:26.513680 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 1, win 12579, options [nop,nop,TS val 1857101612 ecr
>> 1108509398], length 0
>> 18:09:26.513692 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12540, options [nop,nop,TS val 1857101612 ecr
>> 1108509398], length 0
>> 18:09:26.559776 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [P.], ack 622, win 12579, options [nop,nop,TS val 1857101658 ecr
>> 1108509398], length 285
>> 18:09:26.568579 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509402], length 1448
>> 18:09:26.568584 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509402], length 1448
>> 18:09:26.568587 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509402], length 1448
>> 18:09:26.568591 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509402], length 1448
>> 18:09:26.568843 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509403], length 1448
>> 18:09:26.568847 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509403], length 1448
>> 18:09:26.568850 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509403], length 1448
>> 18:09:26.568854 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509403], length 1448
>> 18:09:26.568861 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509403], length 1448
>> 18:09:26.568864 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509403], length 1448
>> 18:09:26.568867 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [P.], ack 622, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509403], length 954
>> 18:09:26.569222 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [.], ack 623, win 12579, options [nop,nop,TS val 1857101667 ecr
>> 1108509403], length 0
>> 18:09:26.569235 IP FreeBSD.1983> =A0Linux.24747: Flags
>> [F.], seq 15720, ack 623, win 12579, options [nop,nop,TS val
>> 1857101667 ecr 1108509403], length 0
>>
>> Then, we make another connection initiated by FreeBSD, but with a
>> sequence number that
>> logically follows from the original outbound connection. =A0This fails,
>> until eventually we automatically give up on timestamps.
>>
>> 18:09:34.631349 IP FreeBSD.57007> =A0Linux.80: Flags [S],
>> seq 704145119, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS val
>> 827482602 ecr 0], length 0
>> 18:09:37.360671 IP FreeBSD.57056> =A0Linux.80: Flags [S],
>> seq 2653481758, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS
>> val 827485331 ecr 0], length 0
>> 18:09:37.631176 IP FreeBSD.57007> =A0Linux.80: Flags [S],
>> seq 704145119, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS val
>> 827485602 ecr 0], length 0
>> 18:09:40.360134 IP FreeBSD.57056> =A0Linux.80: Flags [S],
>> seq 2653481758, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS
>> val 827488331 ecr 0], length 0
>> 18:09:40.831125 IP FreeBSD.57007> =A0Linux.80: Flags [S],
>> seq 704145119, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS val
>> 827488802 ecr 0], length 0
>> 18:09:43.560084 IP FreeBSD.57056> =A0Linux.80: Flags [S],
>> seq 2653481758, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS
>> val 827491531 ecr 0], length 0
>> 18:09:44.031076 IP FreeBSD.57007> =A0Linux.80: Flags [S],
>> seq 704145119, win 65535, options [mss 1460,sackOK,eol], length 0
>> 18:09:44.031280 IP FreeBSD.57007> =A0Linux.80: Flags [.],
>> ack 4142946180, win 65535, length 0
>>
>> We confirmed this further by disabled TCP timestamps on the Linux box
>> via the 'net.ipv4.tcp_timestamps' sysctl, and the problem cleared up.
>> Has any one else seen this? =A0Is there a more graceful work around
>> aside from disabling timestamp support in Linux, or is there a sysctl
>> I'm missing in FreeBSD that causes the timestamp to be global for each
>> peer?
>
>
> Linux does the per-host timestamp check only when you hit a 2MSL
> graveyard connection. =A0It happens in net/ipv4/tcp_ipv4.c around
> line 1376 within a PAWS check.
>
> Our in- and outbound timestamps are not synchronized to the same
> base. =A0A timestamp on an SYN/ACK is somewhat randomized. =A0Whereas
> a timestamp in a SYN is based off ticks and increases with later
> connections.
>
> It seems you have a large number of short lived connections between
> these hosts. =A0You may want to increase the port range for outgoing
> connections to avoid recycling port numbers within the 2MSL time frame.
> Alternatively you could decrease the 2MSL time on Linux. =A0Or you can
> turn off syncookies on FreeBSD to avoid the SYN/ACK timestamp
> randomization.
>
> Linux has over-engineered this whole thing for little to no benefit.
> This additional check presumably also wrecks busy NAT IP addresses
> where port recycling is fast.
>
> --
> Andre

Andre,

It sounds like disabling net.ipv4.tcp_timestamps as we are in Linux
may be the quickest path to resolution without much real world effect.
 The Linux box only talks with internal hosts, whereas the FBSD does
with external end users.  We already have the port range on both sides
opened up, so it seems the alternative to neutering it completely is
lowering the net.ipv4.tcp_fin_timeout and disabling
net.ipv4.tcp_tw_recycle, but it still has a chance to hit this
scenario as it's heavily trafficked.

net.inet.ip.portrange.first: 10000
net.inet.ip.portrange.last: 65535

net.ipv4.ip_local_port_range =3D 1024 61000

Thank you for the info,
Jason



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAAm0r2KAX2H_nVdixur%2BTSJV03Yp9ToUMdZfVpf0Ap5ps-CYQ>