Date: Tue, 9 Jul 2019 17:29:57 +0200 From: Michael Tuexen <tuexen@freebsd.org> To: Paul <devgs@ukr.net> Cc: freebsd-net@freebsd.org Subject: Re: Issues with TCP Timestamps allocation Message-ID: <1E9F3F99-C3E9-44DD-AA70-9B11E19D4769@freebsd.org> In-Reply-To: <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com> References: <1562579483.67527000.24rw4xi5@frv39.fwdcdn.com> <32FD061B-245C-41D2-81DE-1B4756A7173D@freebsd.org> <1562591379.369129000.gpmxvurq@frv39.fwdcdn.com> <DF65CA7F-B5FC-499D-B053-0531596D230C@freebsd.org> <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com> <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org> <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 9. Jul 2019, at 14:58, Paul <devgs@ukr.net> wrote: > > Hi Michael, > > 9 July 2019, 15:34:29, by "Michael Tuexen" <tuexen@freebsd.org>: > >> >> >>> On 8. Jul 2019, at 17:22, Paul <devgs@ukr.net> wrote: >>> >>> >>> >>> 8 July 2019, 17:12:21, by "Michael Tuexen" <tuexen@freebsd.org>: >>> >>>>> On 8. Jul 2019, at 15:24, Paul <devgs@ukr.net> wrote: >>>>> >>>>> Hi Michael, >>>>> >>>>> 8 July 2019, 15:53:15, by "Michael Tuexen" <tuexen@freebsd.org>: >>>>> >>>>>>> On 8. Jul 2019, at 12:37, Paul <devgs@ukr.net> wrote: >>>>>>> >>>>>>> Hi team, >>>>>>> >>>>>>> Recently we had an upgrade to 12 Stable. Immediately after, we have started >>>>>>> seeing some strange connection establishment timeouts to some fixed number >>>>>>> of external (world) hosts. The issue was persistent and easy to reproduce. >>>>>>> Thanks to a patience and dedication of our system engineer we have tracked >>>>>>> this issue down to a specific commit: >>>>>>> >>>>>>> https://svnweb.freebsd.org/base?view=revision&revision=338053 >>>>>>> >>>>>>> This patch was also back-ported into 11 Stable: >>>>>>> >>>>>>> https://svnweb.freebsd.org/base?view=revision&revision=348435 >>>>>>> >>>>>>> Among other things this patch changes the timestamp allocation strategy, >>>>>>> by introducing a deterministic randomness via a hash function that takes >>>>>>> into account a random key as well as source address, source port, dest >>>>>>> address and dest port. As the result, timestamp offsets of different >>>>>>> tuples (SA,SP,DA,DP) will be wildly different and will jump from small >>>>>>> to large numbers and back, as long as something in the tuple changes. >>>>>> Hi Paul, >>>>>> >>>>>> this is correct. >>>>>> >>>>>> Please note that the same happens with the old method, if two hosts with >>>>>> different uptimes are bind a consumer grade NAT. >>>>> >>>>> If NAT does not replace timestamps then yes, it should be the case. >>>>> >>>>>>> >>>>>>> After performing various tests of hosts that produce the above mentioned >>>>>>> issue we came to conclusion that there are some interesting implementations >>>>>>> that drop SYN packets with timestamps smaller than the largest timestamp >>>>>>> value from streams of all recent or current connections from a specific >>>>>>> address. This looks as some kind of SYN flood protection. >>>>>> This also breaks multiple hosts with different uptimes behind a consumer >>>>>> level NAT talking to such a server. >>>>>>> >>>>>>> To ensure that each external host is not going to see a wild jumps of >>>>>>> timestamp values I propose a patch that removes ports from the equation >>>>>>> all together, when calculating the timestamp offset: >>>>>>> >>>>>>> Index: sys/netinet/tcp_subr.c >>>>>>> =================================================================== >>>>>>> --- sys/netinet/tcp_subr.c (revision 348435) >>>>>>> +++ sys/netinet/tcp_subr.c (working copy) >>>>>>> @@ -2224,7 +2224,22 @@ >>>>>>> uint32_t >>>>>>> tcp_new_ts_offset(struct in_conninfo *inc) >>>>>>> { >>>>>>> - return (tcp_keyed_hash(inc, V_ts_offset_secret)); >>>>>>> + /* >>>>>>> + * Some implementations show a strange behaviour when a wildly random >>>>>>> + * timestamps allocated for different streams. It seems that only the >>>>>>> + * SYN packets are affected. Observed implementations drop SYN packets >>>>>>> + * with timestamps smaller than the largest timestamp value of all >>>>>>> + * recent or current connections from specific a address. To mitigate >>>>>>> + * this we are going to ensure that each host will always observe >>>>>>> + * timestamps as increasing no matter the stream: by dropping ports >>>>>>> + * from the equation. >>>>>>> + */ >>>>>>> + struct in_conninfo inc_copy = *inc; >>>>>>> + >>>>>>> + inc_copy.inc_fport = 0; >>>>>>> + inc_copy.inc_lport = 0; >>>>>>> + >>>>>>> + return (tcp_keyed_hash(&inc_copy, V_ts_offset_secret)); >>>>>>> } >>>>>>> >>>>>>> /* >>>>>>> >>>>>>> In any case, the solution of the uptime leak, implemented in rev338053 is >>>>>>> not going to suffer, because a supposed attacker is currently able to use >>>>>>> any fixed values of SP and DP, albeit not 0, anyway, to remove them out >>>>>>> of the equation. >>>>>> Can you describe how a peer can compute the uptime from two observed timestamps? >>>>>> I don't see how you can do that... >>>>> >>>>> Supposed attacker could run a script that continuously monitors timestamps, >>>>> for example via a periodic TCP connection from a fixed local port (eg 12345) >>>>> and a fixed local address to the fixed victim's address and port (eg 80). >>>>> Whenever large discrepancy is observed, attacker can assume that reboot has >>>>> happened (due to V_ts_offset_secret re-generation), hence the received >>>>> timestamp is considered an approximate point of reboot from which the uptime >>>>> can be calculated, until the next reboot and so on. >>>> Ahh, I see. The patch we are talking about is not intended to protect against >>>> continuous monitoring, which is something you can always do. You could even >>>> watch for service availability and detect reboots. A change of the local key >>>> would also look similar to a reboot without a temporary loss of connectivity. >>>> >>>> Thanks for the clarification. >>>>> >>>>>>> >>>>>>> There is the list of example hosts that we were able to reproduce the >>>>>>> issue with: >>>>>>> >>>>>>> curl -v http://88.99.60.171:80 >>>>>>> curl -v http://163.172.71.252:80 >>>>>>> curl -v http://5.9.242.150:80 >>>>>>> curl -v https://185.134.205.105:443 >>>>>>> curl -v https://136.243.1.231:443 >>>>>>> curl -v https://144.76.196.4:443 >>>>>>> curl -v http://94.127.191.194:80 >>>>>>> >>>>>>> To reproduce, call curl repeatedly with a same URL some number of times. >>>>>>> You are going to see some of the requests stuck in >>>>>>> `* Trying XXX.XXX.XXX.XXX...` >>>>>>> >>>>>>> For some reason, the easiest way to reproduce the issue is with nc: >>>>>>> >>>>>>> $ echo "foooooo" | nc -v 88.99.60.171 80 >>>>>>> >>>>>>> Only a few such calls are required until one of them is stuck on connect(): >>>>>>> issuing SYN packets with an exponential backoff. >>>>>> Thanks for providing an end-point to test with. I'll take a look. >>>>>> Just to be clear: You are running a FreeBSD client against one of the above >>>>>> servers and experience the problem with the new timestamp computations. >>>>>> >>>>>> You are not running arbitrary clients against a FreeBSD server... >>>>> >>>>> We are talking about FreeBSD being the client. Peers that yield this unwanted >>>>> behaviour are unknown. Little bit of tinkering showed that some of them run >>>>> Debian: >>>>> >>>>> telnet 88.99.60.171 22 >>>>> Trying 88.99.60.171... >>>>> Connected to 88.99.60.171. >>>>> Escape character is '^]'. >>>>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 >>>> Also some are hosted by Hetzner, but not all. I'll will look into >>>> this tomorrow, since I'm on a deadline today (well it is 2am tomorrow >>>> morning, to be precise)... >>> >>> Thanks a lot, I would appreciate that. >> Hi Paul, >> >> I have looked into this. >> >> * The FreeBSD behaviour is the one which is specified in the last bullet item >> in https://tools.ietf.org/html/rfc7323#section-5.4 >> It is also the one, which is RECOMMENDED in >> https://tools.ietf.org/html/rfc7323#section-7.1 >> >> * My NAT box (a popular one in Germany) does NOT rewrite TCP timestamps. >> >> This means that the host you are referring to have some sort of protection, >> which makes incorrect assumptions. It will also break multiple hosts behind >> a NAT. >> >> I can run >> curl -v http://88.99.60.171:80 >> in a loop without any problems from a FreeBSD head system. I tested 1000 >> iterations or so. The TS.val is jumping up and down as expected. >> I'm wondering why you are observing errors in this case, too. >> >> However, doing something like >> echo "foooooo" | nc -v 88.99.60.171 80 >> triggers the problem. >> >> So I think there is some functionality (in a middlebox or running on the host), >> which incorrectly assume monotonic timestamps between multiple TCP connections >> coming from the same IP address, but only in case of errors at the application layer. > > Yeah, exactly, some hosts seem to enable this only in case of an error in HTTP > communication (some smart proxy?). However, there are some that behave this way > regardless of errors, for example these: > > curl -v https://185.134.205.105:443 > curl -v https://136.243.1.231:443 Wireshark sees an Encrypted Alert in both cases. So I guess this is another indication of "error at the application layer". > >> >> Do you have any insights whether the hosts you are listed share something in >> common. Some of them are hosted by Hetzner, but not all. > > Nope. A whole set of endpoints that we have detected so far is pretty diverse, > containing a lot of different locations geographically, as well as different > hosters. OK. Thanks for the clarification. > >> >> I think in general, it is the correct thing to include the port numbers in >> the offset computation. We might add a sysctl variable to control the inclusion. >> This would allow interworking with broken middleboxes. > > Yeah, I completely agree that these rare cases should not dictate the implementation. > But an ability to enable a work-around via sysctl would be greatly appreciated. > Currently we are unable to roll-out the upgrade across all servers because of this > issue: even though it happens not so often, a lot of requests from our users > get stuck or fail all together. For example, a host 185.134.205.105 is a kind of > social network that our proxy servers connect to so securely access to content, > such as images, on behalf of our users. > >> >> Please note, this does not fix the case of multiple clients behind a NAT. > > Yeah, that's true. Fortunately we don't use NAT. > >> >> I'm also trying to figure out how and why Linux and Windows are handling this. > > Thanks for bothering! Will let you know what I figure out. Best regards Michael > >> >> Best regards >> Michael >> >>> >>>> >>>> Best regards >>>> Michael >>>>> >>>>> >>>>>> >>>>>> Best regards >>>>>> Michael >>>>>> >>>>>> >>>> >>>> >> >>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1E9F3F99-C3E9-44DD-AA70-9B11E19D4769>
