From owner-freebsd-net@freebsd.org  Tue Jul  9 12:34:31 2019
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC4FB15D9945
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Tue,  9 Jul 2019 12:34:30 +0000 (UTC)
 (envelope-from tuexen@freebsd.org)
Received: from drew.franken.de (drew.ipv6.franken.de
 [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "*.franken.de",
 Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 62B738F039
 for <freebsd-net@freebsd.org>; Tue,  9 Jul 2019 12:34:30 +0000 (UTC)
 (envelope-from tuexen@freebsd.org)
Received: from [IPv6:2a02:8109:1140:c3d:1c7:520c:e8a0:1e52] (unknown
 [IPv6:2a02:8109:1140:c3d:1c7:520c:e8a0:1e52])
 (Authenticated sender: macmic)
 by mail-n.franken.de (Postfix) with ESMTPSA id 6847D71E3F933;
 Tue,  9 Jul 2019 14:34:25 +0200 (CEST)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Subject: Re: Issues with TCP Timestamps allocation
From: Michael Tuexen <tuexen@freebsd.org>
In-Reply-To: <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com>
Date: Tue, 9 Jul 2019 14:34:24 +0200
Cc: freebsd-net@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org>
References: <1562579483.67527000.24rw4xi5@frv39.fwdcdn.com>
 <32FD061B-245C-41D2-81DE-1B4756A7173D@freebsd.org>
 <1562591379.369129000.gpmxvurq@frv39.fwdcdn.com>
 <DF65CA7F-B5FC-499D-B053-0531596D230C@freebsd.org>
 <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com>
To: Paul <devgs@ukr.net>
X-Mailer: Apple Mail (2.3445.104.11)
X-Spam-Status: No, score=-1.7 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR autolearn=disabled version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jul 2019 12:34:31 -0000


> On 8. Jul 2019, at 17:22, Paul <devgs@ukr.net> wrote:
>=20
>=20
>=20
> 8 July 2019, 17:12:21, by "Michael Tuexen" <tuexen@freebsd.org>:
>=20
>>> On 8. Jul 2019, at 15:24, Paul <devgs@ukr.net> wrote:
>>>=20
>>> Hi Michael,
>>>=20
>>> 8 July 2019, 15:53:15, by "Michael Tuexen" <tuexen@freebsd.org>:
>>>=20
>>>>> On 8. Jul 2019, at 12:37, Paul <devgs@ukr.net> wrote:
>>>>>=20
>>>>> Hi team,
>>>>>=20
>>>>> Recently we had an upgrade to 12 Stable. Immediately after, we =
have started=20
>>>>> seeing some strange connection establishment timeouts to some =
fixed number
>>>>> of external (world) hosts. The issue was persistent and easy to =
reproduce.
>>>>> Thanks to a patience and dedication of our system engineer we have =
tracked =20
>>>>> this issue down to a specific commit:
>>>>>=20
>>>>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D338053
>>>>>=20
>>>>> This patch was also back-ported into 11 Stable:
>>>>>=20
>>>>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D348435
>>>>>=20
>>>>> Among other things this patch changes the timestamp allocation =
strategy,
>>>>> by introducing a deterministic randomness via a hash function that =
takes
>>>>> into account a random key as well as source address, source port, =
dest
>>>>> address and dest port. As the result, timestamp offsets of =
different
>>>>> tuples (SA,SP,DA,DP) will be wildly different and will jump from =
small=20
>>>>> to large numbers and back, as long as something in the tuple =
changes.
>>>> Hi Paul,
>>>>=20
>>>> this is correct.
>>>>=20
>>>> Please note that the same happens with the old method, if two hosts =
with
>>>> different uptimes are bind a consumer grade NAT.
>>>=20
>>> If NAT does not replace timestamps then yes, it should be the case.
>>>=20
>>>>>=20
>>>>> After performing various tests of hosts that produce the above =
mentioned=20
>>>>> issue we came to conclusion that there are some interesting =
implementations=20
>>>>> that drop SYN packets with timestamps smaller  than the largest =
timestamp=20
>>>>> value from streams of all recent or current connections from a =
specific=20
>>>>> address. This looks as some kind of SYN flood protection.
>>>> This also breaks multiple hosts with different uptimes behind a =
consumer
>>>> level NAT talking to such a server.
>>>>>=20
>>>>> To ensure that each external host is not going to see a wild jumps =
of=20
>>>>> timestamp values I propose a patch that removes ports from the =
equation
>>>>> all together, when calculating the timestamp offset:
>>>>>=20
>>>>> Index: sys/netinet/tcp_subr.c
>>>>> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>> --- sys/netinet/tcp_subr.c	(revision 348435)
>>>>> +++ sys/netinet/tcp_subr.c	(working copy)
>>>>> @@ -2224,7 +2224,22 @@
>>>>> uint32_t
>>>>> tcp_new_ts_offset(struct in_conninfo *inc)
>>>>> {
>>>>> -	return (tcp_keyed_hash(inc, V_ts_offset_secret));
>>>>> +        /*=20
>>>>> +         * Some implementations show a strange behaviour when a =
wildly random=20
>>>>> +         * timestamps allocated for different streams. It seems =
that only the
>>>>> +         * SYN packets are affected. Observed implementations =
drop SYN packets
>>>>> +         * with timestamps smaller than the largest timestamp =
value of all=20
>>>>> +         * recent or current connections from specific a address. =
To mitigate=20
>>>>> +         * this we are going to ensure that each host will always =
observe=20
>>>>> +         * timestamps as increasing no matter the stream: by =
dropping ports
>>>>> +         * from the equation.
>>>>> +         */=20
>>>>> +        struct in_conninfo inc_copy =3D *inc;
>>>>> +
>>>>> +        inc_copy.inc_fport =3D 0;
>>>>> +        inc_copy.inc_lport =3D 0;
>>>>> +
>>>>> +	return (tcp_keyed_hash(&inc_copy, V_ts_offset_secret));
>>>>> }
>>>>>=20
>>>>> /*
>>>>>=20
>>>>> In any case, the solution of the uptime leak, implemented in =
rev338053 is=20
>>>>> not going to suffer, because a supposed attacker is currently able =
to use=20
>>>>> any fixed values of SP and DP, albeit not 0, anyway, to remove =
them out=20
>>>>> of the equation.
>>>> Can you describe how a peer can compute the uptime from two =
observed timestamps?
>>>> I don't see how you can do that...
>>>=20
>>> Supposed attacker could run a script that continuously monitors =
timestamps,
>>> for example via a periodic TCP connection from a fixed local port =
(eg 12345)=20
>>> and a fixed local address to the fixed victim's address and port (eg =
80).
>>> Whenever large discrepancy is observed, attacker can assume that =
reboot has=20
>>> happened (due to V_ts_offset_secret re-generation), hence the =
received=20
>>> timestamp is considered an approximate point of reboot from which =
the uptime
>>> can be calculated, until the next reboot and so on.
>> Ahh, I see. The patch we are talking about is not intended to protect =
against
>> continuous monitoring, which is something you can always do. You =
could even
>> watch for service availability and detect reboots. A change of the =
local key
>> would also look similar to a reboot without a temporary loss of =
connectivity.
>>=20
>> Thanks for the clarification.
>>>=20
>>>>>=20
>>>>> There is the list of example hosts that we were able to reproduce =
the=20
>>>>> issue with:
>>>>>=20
>>>>> curl -v http://88.99.60.171:80
>>>>> curl -v http://163.172.71.252:80
>>>>> curl -v http://5.9.242.150:80
>>>>> curl -v https://185.134.205.105:443
>>>>> curl -v https://136.243.1.231:443
>>>>> curl -v https://144.76.196.4:443
>>>>> curl -v http://94.127.191.194:80
>>>>>=20
>>>>> To reproduce, call curl repeatedly with a same URL some number of =
times.=20
>>>>> You are going  to see some of the requests stuck in=20
>>>>> `*    Trying XXX.XXX.XXX.XXX...`
>>>>>=20
>>>>> For some reason, the easiest way to reproduce the issue is with =
nc:
>>>>>=20
>>>>> $ echo "foooooo" | nc -v 88.99.60.171 80
>>>>>=20
>>>>> Only a few such calls are required until one of them is stuck on =
connect():
>>>>> issuing SYN packets with an exponential backoff.
>>>> Thanks for providing an end-point to test with. I'll take a look.
>>>> Just to be clear: You are running a FreeBSD client against one of =
the above
>>>> servers and experience the problem with the new timestamp =
computations.
>>>>=20
>>>> You are not running arbitrary clients against a FreeBSD server...
>>>=20
>>> We are talking about FreeBSD being the client. Peers that yield this =
unwanted
>>> behaviour are unknown. Little bit of tinkering showed that some of =
them run=20
>>> Debian:
>>>=20
>>> telnet 88.99.60.171 22
>>> Trying 88.99.60.171...
>>> Connected to 88.99.60.171.
>>> Escape character is '^]'.
>>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
>> Also some are hosted by Hetzner, but not all. I'll will look into
>> this tomorrow, since I'm on a deadline today (well it is 2am tomorrow
>> morning, to be precise)...
>=20
> Thanks a lot, I would appreciate that.
Hi Paul,

I have looked into this.

* The FreeBSD behaviour is the one which is specified in the last bullet =
item
  in https://tools.ietf.org/html/rfc7323#section-5.4
  It is also the one, which is RECOMMENDED in
  https://tools.ietf.org/html/rfc7323#section-7.1=20

* My NAT box (a popular one in Germany) does NOT rewrite TCP timestamps.

This means that the host you are referring to have some sort of =
protection,
which makes incorrect assumptions. It will also break multiple hosts =
behind
a NAT.

I can run
curl -v http://88.99.60.171:80
in a loop without any problems from a FreeBSD head system. I tested 1000
iterations or so. The TS.val is jumping up and down as expected.
I'm wondering why you are observing errors in this case, too.

However, doing something like
echo "foooooo" | nc -v 88.99.60.171 80
triggers the problem.

So I think there is some functionality (in a middlebox or running on the =
host),
which incorrectly assume monotonic timestamps between multiple TCP =
connections
coming from the same IP address, but only in case of errors at the =
application layer.

Do you have any insights whether the hosts you are listed share =
something in
common. Some of them are hosted by Hetzner, but not all.

I think in general, it is the correct thing to include the port numbers =
in
the offset computation. We might add a sysctl variable to control the =
inclusion.
This would allow interworking with broken middleboxes.

Please note, this does not fix the case of multiple clients behind a =
NAT.

I'm also trying to figure out how and why Linux and Windows are handling =
this.

Best regards
Michael

>=20
>>=20
>> Best regards
>> Michael=20
>>>=20
>>>=20
>>>>=20
>>>> Best regards
>>>> Michael
>>>>=20
>>>>=20
>>=20
>>=20