From owner-freebsd-net@freebsd.org  Tue Jul  9 15:30:05 2019
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C75415DD02D
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Tue,  9 Jul 2019 15:30:05 +0000 (UTC)
 (envelope-from tuexen@freebsd.org)
Received: from drew.franken.de (drew.ipv6.franken.de
 [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "*.franken.de",
 Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CABEE6E6E0
 for <freebsd-net@freebsd.org>; Tue,  9 Jul 2019 15:30:04 +0000 (UTC)
 (envelope-from tuexen@freebsd.org)
Received: from [IPv6:2a02:8109:1140:c3d:1c7:520c:e8a0:1e52] (unknown
 [IPv6:2a02:8109:1140:c3d:1c7:520c:e8a0:1e52])
 (Authenticated sender: macmic)
 by mail-n.franken.de (Postfix) with ESMTPSA id DF85C71E3F456;
 Tue,  9 Jul 2019 17:29:59 +0200 (CEST)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Subject: Re: Issues with TCP Timestamps allocation
From: Michael Tuexen <tuexen@freebsd.org>
In-Reply-To: <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com>
Date: Tue, 9 Jul 2019 17:29:57 +0200
Cc: freebsd-net@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <1E9F3F99-C3E9-44DD-AA70-9B11E19D4769@freebsd.org>
References: <1562579483.67527000.24rw4xi5@frv39.fwdcdn.com>
 <32FD061B-245C-41D2-81DE-1B4756A7173D@freebsd.org>
 <1562591379.369129000.gpmxvurq@frv39.fwdcdn.com>
 <DF65CA7F-B5FC-499D-B053-0531596D230C@freebsd.org>
 <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com>
 <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org>
 <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com>
To: Paul <devgs@ukr.net>
X-Mailer: Apple Mail (2.3445.104.11)
X-Spam-Status: No, score=-1.7 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR autolearn=disabled version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jul 2019 15:30:05 -0000


> On 9. Jul 2019, at 14:58, Paul <devgs@ukr.net> wrote:
>=20
> Hi Michael,
>=20
> 9 July 2019, 15:34:29, by "Michael Tuexen" <tuexen@freebsd.org>:
>=20
>>=20
>>=20
>>> On 8. Jul 2019, at 17:22, Paul <devgs@ukr.net> wrote:
>>>=20
>>>=20
>>>=20
>>> 8 July 2019, 17:12:21, by "Michael Tuexen" <tuexen@freebsd.org>:
>>>=20
>>>>> On 8. Jul 2019, at 15:24, Paul <devgs@ukr.net> wrote:
>>>>>=20
>>>>> Hi Michael,
>>>>>=20
>>>>> 8 July 2019, 15:53:15, by "Michael Tuexen" <tuexen@freebsd.org>:
>>>>>=20
>>>>>>> On 8. Jul 2019, at 12:37, Paul <devgs@ukr.net> wrote:
>>>>>>>=20
>>>>>>> Hi team,
>>>>>>>=20
>>>>>>> Recently we had an upgrade to 12 Stable. Immediately after, we =
have started=20
>>>>>>> seeing some strange connection establishment timeouts to some =
fixed number
>>>>>>> of external (world) hosts. The issue was persistent and easy to =
reproduce.
>>>>>>> Thanks to a patience and dedication of our system engineer we =
have tracked =20
>>>>>>> this issue down to a specific commit:
>>>>>>>=20
>>>>>>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D338053
>>>>>>>=20
>>>>>>> This patch was also back-ported into 11 Stable:
>>>>>>>=20
>>>>>>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D348435
>>>>>>>=20
>>>>>>> Among other things this patch changes the timestamp allocation =
strategy,
>>>>>>> by introducing a deterministic randomness via a hash function =
that takes
>>>>>>> into account a random key as well as source address, source =
port, dest
>>>>>>> address and dest port. As the result, timestamp offsets of =
different
>>>>>>> tuples (SA,SP,DA,DP) will be wildly different and will jump from =
small=20
>>>>>>> to large numbers and back, as long as something in the tuple =
changes.
>>>>>> Hi Paul,
>>>>>>=20
>>>>>> this is correct.
>>>>>>=20
>>>>>> Please note that the same happens with the old method, if two =
hosts with
>>>>>> different uptimes are bind a consumer grade NAT.
>>>>>=20
>>>>> If NAT does not replace timestamps then yes, it should be the =
case.
>>>>>=20
>>>>>>>=20
>>>>>>> After performing various tests of hosts that produce the above =
mentioned=20
>>>>>>> issue we came to conclusion that there are some interesting =
implementations=20
>>>>>>> that drop SYN packets with timestamps smaller  than the largest =
timestamp=20
>>>>>>> value from streams of all recent or current connections from a =
specific=20
>>>>>>> address. This looks as some kind of SYN flood protection.
>>>>>> This also breaks multiple hosts with different uptimes behind a =
consumer
>>>>>> level NAT talking to such a server.
>>>>>>>=20
>>>>>>> To ensure that each external host is not going to see a wild =
jumps of=20
>>>>>>> timestamp values I propose a patch that removes ports from the =
equation
>>>>>>> all together, when calculating the timestamp offset:
>>>>>>>=20
>>>>>>> Index: sys/netinet/tcp_subr.c
>>>>>>> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>>> --- sys/netinet/tcp_subr.c	(revision 348435)
>>>>>>> +++ sys/netinet/tcp_subr.c	(working copy)
>>>>>>> @@ -2224,7 +2224,22 @@
>>>>>>> uint32_t
>>>>>>> tcp_new_ts_offset(struct in_conninfo *inc)
>>>>>>> {
>>>>>>> -	return (tcp_keyed_hash(inc, V_ts_offset_secret));
>>>>>>> +        /*=20
>>>>>>> +         * Some implementations show a strange behaviour when a =
wildly random=20
>>>>>>> +         * timestamps allocated for different streams. It seems =
that only the
>>>>>>> +         * SYN packets are affected. Observed implementations =
drop SYN packets
>>>>>>> +         * with timestamps smaller than the largest timestamp =
value of all=20
>>>>>>> +         * recent or current connections from specific a =
address. To mitigate=20
>>>>>>> +         * this we are going to ensure that each host will =
always observe=20
>>>>>>> +         * timestamps as increasing no matter the stream: by =
dropping ports
>>>>>>> +         * from the equation.
>>>>>>> +         */=20
>>>>>>> +        struct in_conninfo inc_copy =3D *inc;
>>>>>>> +
>>>>>>> +        inc_copy.inc_fport =3D 0;
>>>>>>> +        inc_copy.inc_lport =3D 0;
>>>>>>> +
>>>>>>> +	return (tcp_keyed_hash(&inc_copy, V_ts_offset_secret));
>>>>>>> }
>>>>>>>=20
>>>>>>> /*
>>>>>>>=20
>>>>>>> In any case, the solution of the uptime leak, implemented in =
rev338053 is=20
>>>>>>> not going to suffer, because a supposed attacker is currently =
able to use=20
>>>>>>> any fixed values of SP and DP, albeit not 0, anyway, to remove =
them out=20
>>>>>>> of the equation.
>>>>>> Can you describe how a peer can compute the uptime from two =
observed timestamps?
>>>>>> I don't see how you can do that...
>>>>>=20
>>>>> Supposed attacker could run a script that continuously monitors =
timestamps,
>>>>> for example via a periodic TCP connection from a fixed local port =
(eg 12345)=20
>>>>> and a fixed local address to the fixed victim's address and port =
(eg 80).
>>>>> Whenever large discrepancy is observed, attacker can assume that =
reboot has=20
>>>>> happened (due to V_ts_offset_secret re-generation), hence the =
received=20
>>>>> timestamp is considered an approximate point of reboot from which =
the uptime
>>>>> can be calculated, until the next reboot and so on.
>>>> Ahh, I see. The patch we are talking about is not intended to =
protect against
>>>> continuous monitoring, which is something you can always do. You =
could even
>>>> watch for service availability and detect reboots. A change of the =
local key
>>>> would also look similar to a reboot without a temporary loss of =
connectivity.
>>>>=20
>>>> Thanks for the clarification.
>>>>>=20
>>>>>>>=20
>>>>>>> There is the list of example hosts that we were able to =
reproduce the=20
>>>>>>> issue with:
>>>>>>>=20
>>>>>>> curl -v http://88.99.60.171:80
>>>>>>> curl -v http://163.172.71.252:80
>>>>>>> curl -v http://5.9.242.150:80
>>>>>>> curl -v https://185.134.205.105:443
>>>>>>> curl -v https://136.243.1.231:443
>>>>>>> curl -v https://144.76.196.4:443
>>>>>>> curl -v http://94.127.191.194:80
>>>>>>>=20
>>>>>>> To reproduce, call curl repeatedly with a same URL some number =
of times.=20
>>>>>>> You are going  to see some of the requests stuck in=20
>>>>>>> `*    Trying XXX.XXX.XXX.XXX...`
>>>>>>>=20
>>>>>>> For some reason, the easiest way to reproduce the issue is with =
nc:
>>>>>>>=20
>>>>>>> $ echo "foooooo" | nc -v 88.99.60.171 80
>>>>>>>=20
>>>>>>> Only a few such calls are required until one of them is stuck on =
connect():
>>>>>>> issuing SYN packets with an exponential backoff.
>>>>>> Thanks for providing an end-point to test with. I'll take a look.
>>>>>> Just to be clear: You are running a FreeBSD client against one of =
the above
>>>>>> servers and experience the problem with the new timestamp =
computations.
>>>>>>=20
>>>>>> You are not running arbitrary clients against a FreeBSD server...
>>>>>=20
>>>>> We are talking about FreeBSD being the client. Peers that yield =
this unwanted
>>>>> behaviour are unknown. Little bit of tinkering showed that some of =
them run=20
>>>>> Debian:
>>>>>=20
>>>>> telnet 88.99.60.171 22
>>>>> Trying 88.99.60.171...
>>>>> Connected to 88.99.60.171.
>>>>> Escape character is '^]'.
>>>>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
>>>> Also some are hosted by Hetzner, but not all. I'll will look into
>>>> this tomorrow, since I'm on a deadline today (well it is 2am =
tomorrow
>>>> morning, to be precise)...
>>>=20
>>> Thanks a lot, I would appreciate that.
>> Hi Paul,
>>=20
>> I have looked into this.
>>=20
>> * The FreeBSD behaviour is the one which is specified in the last =
bullet item
>>  in https://tools.ietf.org/html/rfc7323#section-5.4
>>  It is also the one, which is RECOMMENDED in
>>  https://tools.ietf.org/html/rfc7323#section-7.1=20
>>=20
>> * My NAT box (a popular one in Germany) does NOT rewrite TCP =
timestamps.
>>=20
>> This means that the host you are referring to have some sort of =
protection,
>> which makes incorrect assumptions. It will also break multiple hosts =
behind
>> a NAT.
>>=20
>> I can run
>> curl -v http://88.99.60.171:80
>> in a loop without any problems from a FreeBSD head system. I tested =
1000
>> iterations or so. The TS.val is jumping up and down as expected.
>> I'm wondering why you are observing errors in this case, too.
>>=20
>> However, doing something like
>> echo "foooooo" | nc -v 88.99.60.171 80
>> triggers the problem.
>>=20
>> So I think there is some functionality (in a middlebox or running on =
the host),
>> which incorrectly assume monotonic timestamps between multiple TCP =
connections
>> coming from the same IP address, but only in case of errors at the =
application layer.
>=20
> Yeah, exactly, some hosts seem to enable this only in case of an error =
in HTTP
> communication (some smart proxy?). However, there are some that behave =
this way
> regardless of errors, for example these:
>=20
> curl -v https://185.134.205.105:443
> curl -v https://136.243.1.231:443
Wireshark sees an Encrypted Alert in both cases. So I guess this is =
another indication
of "error at the application layer".
>=20
>>=20
>> Do you have any insights whether the hosts you are listed share =
something in
>> common. Some of them are hosted by Hetzner, but not all.
>=20
> Nope. A whole set of endpoints that we have detected so far is pretty =
diverse,
> containing a lot of different locations geographically, as well as =
different
> hosters.
OK. Thanks for the clarification.
>=20
>>=20
>> I think in general, it is the correct thing to include the port =
numbers in
>> the offset computation. We might add a sysctl variable to control the =
inclusion.
>> This would allow interworking with broken middleboxes.
>=20
> Yeah, I completely agree that these rare cases should not dictate the =
implementation.
> But an ability to enable a work-around via sysctl would be greatly =
appreciated.
> Currently we are unable to roll-out the upgrade across all servers =
because of this
> issue: even though it happens not so often, a lot of requests from our =
users=20
> get stuck or fail all together. For example, a host 185.134.205.105 is =
a kind of
> social network that our proxy servers connect to so securely access to =
content,
> such as images, on behalf of our users.
>=20
>>=20
>> Please note, this does not fix the case of multiple clients behind a =
NAT.
>=20
> Yeah, that's true. Fortunately we don't use NAT.
>=20
>>=20
>> I'm also trying to figure out how and why Linux and Windows are =
handling this.
>=20
> Thanks for bothering!
Will let you know what I figure out.

Best regards
Michael
>=20
>>=20
>> Best regards
>> Michael
>>=20
>>>=20
>>>>=20
>>>> Best regards
>>>> Michael=20
>>>>>=20
>>>>>=20
>>>>>>=20
>>>>>> Best regards
>>>>>> Michael
>>>>>>=20
>>>>>>=20
>>>>=20
>>>>=20
>>=20
>>=20