From owner-freebsd-net@freebsd.org Tue Jul 9 15:30:05 2019 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C75415DD02D for ; Tue, 9 Jul 2019 15:30:05 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CABEE6E6E0 for ; Tue, 9 Jul 2019 15:30:04 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from [IPv6:2a02:8109:1140:c3d:1c7:520c:e8a0:1e52] (unknown [IPv6:2a02:8109:1140:c3d:1c7:520c:e8a0:1e52]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTPSA id DF85C71E3F456; Tue, 9 Jul 2019 17:29:59 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: Issues with TCP Timestamps allocation From: Michael Tuexen In-Reply-To: <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com> Date: Tue, 9 Jul 2019 17:29:57 +0200 Cc: freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <1E9F3F99-C3E9-44DD-AA70-9B11E19D4769@freebsd.org> References: <1562579483.67527000.24rw4xi5@frv39.fwdcdn.com> <32FD061B-245C-41D2-81DE-1B4756A7173D@freebsd.org> <1562591379.369129000.gpmxvurq@frv39.fwdcdn.com> <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com> <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org> <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com> To: Paul X-Mailer: Apple Mail (2.3445.104.11) X-Spam-Status: No, score=-1.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2019 15:30:05 -0000 > On 9. Jul 2019, at 14:58, Paul wrote: >=20 > Hi Michael, >=20 > 9 July 2019, 15:34:29, by "Michael Tuexen" : >=20 >>=20 >>=20 >>> On 8. Jul 2019, at 17:22, Paul wrote: >>>=20 >>>=20 >>>=20 >>> 8 July 2019, 17:12:21, by "Michael Tuexen" : >>>=20 >>>>> On 8. Jul 2019, at 15:24, Paul wrote: >>>>>=20 >>>>> Hi Michael, >>>>>=20 >>>>> 8 July 2019, 15:53:15, by "Michael Tuexen" : >>>>>=20 >>>>>>> On 8. Jul 2019, at 12:37, Paul wrote: >>>>>>>=20 >>>>>>> Hi team, >>>>>>>=20 >>>>>>> Recently we had an upgrade to 12 Stable. Immediately after, we = have started=20 >>>>>>> seeing some strange connection establishment timeouts to some = fixed number >>>>>>> of external (world) hosts. The issue was persistent and easy to = reproduce. >>>>>>> Thanks to a patience and dedication of our system engineer we = have tracked =20 >>>>>>> this issue down to a specific commit: >>>>>>>=20 >>>>>>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D338053 >>>>>>>=20 >>>>>>> This patch was also back-ported into 11 Stable: >>>>>>>=20 >>>>>>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D348435 >>>>>>>=20 >>>>>>> Among other things this patch changes the timestamp allocation = strategy, >>>>>>> by introducing a deterministic randomness via a hash function = that takes >>>>>>> into account a random key as well as source address, source = port, dest >>>>>>> address and dest port. As the result, timestamp offsets of = different >>>>>>> tuples (SA,SP,DA,DP) will be wildly different and will jump from = small=20 >>>>>>> to large numbers and back, as long as something in the tuple = changes. >>>>>> Hi Paul, >>>>>>=20 >>>>>> this is correct. >>>>>>=20 >>>>>> Please note that the same happens with the old method, if two = hosts with >>>>>> different uptimes are bind a consumer grade NAT. >>>>>=20 >>>>> If NAT does not replace timestamps then yes, it should be the = case. >>>>>=20 >>>>>>>=20 >>>>>>> After performing various tests of hosts that produce the above = mentioned=20 >>>>>>> issue we came to conclusion that there are some interesting = implementations=20 >>>>>>> that drop SYN packets with timestamps smaller than the largest = timestamp=20 >>>>>>> value from streams of all recent or current connections from a = specific=20 >>>>>>> address. This looks as some kind of SYN flood protection. >>>>>> This also breaks multiple hosts with different uptimes behind a = consumer >>>>>> level NAT talking to such a server. >>>>>>>=20 >>>>>>> To ensure that each external host is not going to see a wild = jumps of=20 >>>>>>> timestamp values I propose a patch that removes ports from the = equation >>>>>>> all together, when calculating the timestamp offset: >>>>>>>=20 >>>>>>> Index: sys/netinet/tcp_subr.c >>>>>>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>>> --- sys/netinet/tcp_subr.c (revision 348435) >>>>>>> +++ sys/netinet/tcp_subr.c (working copy) >>>>>>> @@ -2224,7 +2224,22 @@ >>>>>>> uint32_t >>>>>>> tcp_new_ts_offset(struct in_conninfo *inc) >>>>>>> { >>>>>>> - return (tcp_keyed_hash(inc, V_ts_offset_secret)); >>>>>>> + /*=20 >>>>>>> + * Some implementations show a strange behaviour when a = wildly random=20 >>>>>>> + * timestamps allocated for different streams. It seems = that only the >>>>>>> + * SYN packets are affected. Observed implementations = drop SYN packets >>>>>>> + * with timestamps smaller than the largest timestamp = value of all=20 >>>>>>> + * recent or current connections from specific a = address. To mitigate=20 >>>>>>> + * this we are going to ensure that each host will = always observe=20 >>>>>>> + * timestamps as increasing no matter the stream: by = dropping ports >>>>>>> + * from the equation. >>>>>>> + */=20 >>>>>>> + struct in_conninfo inc_copy =3D *inc; >>>>>>> + >>>>>>> + inc_copy.inc_fport =3D 0; >>>>>>> + inc_copy.inc_lport =3D 0; >>>>>>> + >>>>>>> + return (tcp_keyed_hash(&inc_copy, V_ts_offset_secret)); >>>>>>> } >>>>>>>=20 >>>>>>> /* >>>>>>>=20 >>>>>>> In any case, the solution of the uptime leak, implemented in = rev338053 is=20 >>>>>>> not going to suffer, because a supposed attacker is currently = able to use=20 >>>>>>> any fixed values of SP and DP, albeit not 0, anyway, to remove = them out=20 >>>>>>> of the equation. >>>>>> Can you describe how a peer can compute the uptime from two = observed timestamps? >>>>>> I don't see how you can do that... >>>>>=20 >>>>> Supposed attacker could run a script that continuously monitors = timestamps, >>>>> for example via a periodic TCP connection from a fixed local port = (eg 12345)=20 >>>>> and a fixed local address to the fixed victim's address and port = (eg 80). >>>>> Whenever large discrepancy is observed, attacker can assume that = reboot has=20 >>>>> happened (due to V_ts_offset_secret re-generation), hence the = received=20 >>>>> timestamp is considered an approximate point of reboot from which = the uptime >>>>> can be calculated, until the next reboot and so on. >>>> Ahh, I see. The patch we are talking about is not intended to = protect against >>>> continuous monitoring, which is something you can always do. You = could even >>>> watch for service availability and detect reboots. A change of the = local key >>>> would also look similar to a reboot without a temporary loss of = connectivity. >>>>=20 >>>> Thanks for the clarification. >>>>>=20 >>>>>>>=20 >>>>>>> There is the list of example hosts that we were able to = reproduce the=20 >>>>>>> issue with: >>>>>>>=20 >>>>>>> curl -v http://88.99.60.171:80 >>>>>>> curl -v http://163.172.71.252:80 >>>>>>> curl -v http://5.9.242.150:80 >>>>>>> curl -v https://185.134.205.105:443 >>>>>>> curl -v https://136.243.1.231:443 >>>>>>> curl -v https://144.76.196.4:443 >>>>>>> curl -v http://94.127.191.194:80 >>>>>>>=20 >>>>>>> To reproduce, call curl repeatedly with a same URL some number = of times.=20 >>>>>>> You are going to see some of the requests stuck in=20 >>>>>>> `* Trying XXX.XXX.XXX.XXX...` >>>>>>>=20 >>>>>>> For some reason, the easiest way to reproduce the issue is with = nc: >>>>>>>=20 >>>>>>> $ echo "foooooo" | nc -v 88.99.60.171 80 >>>>>>>=20 >>>>>>> Only a few such calls are required until one of them is stuck on = connect(): >>>>>>> issuing SYN packets with an exponential backoff. >>>>>> Thanks for providing an end-point to test with. I'll take a look. >>>>>> Just to be clear: You are running a FreeBSD client against one of = the above >>>>>> servers and experience the problem with the new timestamp = computations. >>>>>>=20 >>>>>> You are not running arbitrary clients against a FreeBSD server... >>>>>=20 >>>>> We are talking about FreeBSD being the client. Peers that yield = this unwanted >>>>> behaviour are unknown. Little bit of tinkering showed that some of = them run=20 >>>>> Debian: >>>>>=20 >>>>> telnet 88.99.60.171 22 >>>>> Trying 88.99.60.171... >>>>> Connected to 88.99.60.171. >>>>> Escape character is '^]'. >>>>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 >>>> Also some are hosted by Hetzner, but not all. I'll will look into >>>> this tomorrow, since I'm on a deadline today (well it is 2am = tomorrow >>>> morning, to be precise)... >>>=20 >>> Thanks a lot, I would appreciate that. >> Hi Paul, >>=20 >> I have looked into this. >>=20 >> * The FreeBSD behaviour is the one which is specified in the last = bullet item >> in https://tools.ietf.org/html/rfc7323#section-5.4 >> It is also the one, which is RECOMMENDED in >> https://tools.ietf.org/html/rfc7323#section-7.1=20 >>=20 >> * My NAT box (a popular one in Germany) does NOT rewrite TCP = timestamps. >>=20 >> This means that the host you are referring to have some sort of = protection, >> which makes incorrect assumptions. It will also break multiple hosts = behind >> a NAT. >>=20 >> I can run >> curl -v http://88.99.60.171:80 >> in a loop without any problems from a FreeBSD head system. I tested = 1000 >> iterations or so. The TS.val is jumping up and down as expected. >> I'm wondering why you are observing errors in this case, too. >>=20 >> However, doing something like >> echo "foooooo" | nc -v 88.99.60.171 80 >> triggers the problem. >>=20 >> So I think there is some functionality (in a middlebox or running on = the host), >> which incorrectly assume monotonic timestamps between multiple TCP = connections >> coming from the same IP address, but only in case of errors at the = application layer. >=20 > Yeah, exactly, some hosts seem to enable this only in case of an error = in HTTP > communication (some smart proxy?). However, there are some that behave = this way > regardless of errors, for example these: >=20 > curl -v https://185.134.205.105:443 > curl -v https://136.243.1.231:443 Wireshark sees an Encrypted Alert in both cases. So I guess this is = another indication of "error at the application layer". >=20 >>=20 >> Do you have any insights whether the hosts you are listed share = something in >> common. Some of them are hosted by Hetzner, but not all. >=20 > Nope. A whole set of endpoints that we have detected so far is pretty = diverse, > containing a lot of different locations geographically, as well as = different > hosters. OK. Thanks for the clarification. >=20 >>=20 >> I think in general, it is the correct thing to include the port = numbers in >> the offset computation. We might add a sysctl variable to control the = inclusion. >> This would allow interworking with broken middleboxes. >=20 > Yeah, I completely agree that these rare cases should not dictate the = implementation. > But an ability to enable a work-around via sysctl would be greatly = appreciated. > Currently we are unable to roll-out the upgrade across all servers = because of this > issue: even though it happens not so often, a lot of requests from our = users=20 > get stuck or fail all together. For example, a host 185.134.205.105 is = a kind of > social network that our proxy servers connect to so securely access to = content, > such as images, on behalf of our users. >=20 >>=20 >> Please note, this does not fix the case of multiple clients behind a = NAT. >=20 > Yeah, that's true. Fortunately we don't use NAT. >=20 >>=20 >> I'm also trying to figure out how and why Linux and Windows are = handling this. >=20 > Thanks for bothering! Will let you know what I figure out. Best regards Michael >=20 >>=20 >> Best regards >> Michael >>=20 >>>=20 >>>>=20 >>>> Best regards >>>> Michael=20 >>>>>=20 >>>>>=20 >>>>>>=20 >>>>>> Best regards >>>>>> Michael >>>>>>=20 >>>>>>=20 >>>>=20 >>>>=20 >>=20 >>=20