From owner-freebsd-net@freebsd.org Wed Jul 17 11:55:06 2019 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 02902AA71C for ; Wed, 17 Jul 2019 11:55:06 +0000 (UTC) (envelope-from satan@ukr.net) Received: from hell.ukr.net (hell.ukr.net [212.42.67.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "*.ukr.net", Issuer "Thawte RSA CA 2018" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ABC4775D5D; Wed, 17 Jul 2019 11:55:05 +0000 (UTC) (envelope-from satan@ukr.net) Received: from satan by hell.ukr.net with local ID 1hniWU-000DqS-Ap ; Wed, 17 Jul 2019 14:55:02 +0300 Date: Wed, 17 Jul 2019 14:55:02 +0300 From: Vitalij Satanivskij To: Michael Tuexen Cc: Vitalij Satanivskij , freebsd-net@freebsd.org, Paul Subject: Re: Issues with TCP Timestamps allocation Message-ID: <20190717115502.GA53155@hell.ukr.net> Reply-To: satan@ukr.net References: <1562591379.369129000.gpmxvurq@frv39.fwdcdn.com> <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com> <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org> <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com> <1E9F3F99-C3E9-44DD-AA70-9B11E19D4769@freebsd.org> <20190717074243.GA65665@hell.ukr.net> <20190717100926.GA24984@hell.ukr.net> <48817BF6-AEDD-4D28-95F8-A4D53E4999B1@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48817BF6-AEDD-4D28-95F8-A4D53E4999B1@freebsd.org> User-Agent: Mutt/1.12.1 (2019-06-15) X-Rspamd-Queue-Id: ABC4775D5D X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.92 / 15.00]; NEURAL_HAM_MEDIUM(-0.99)[-0.993,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.92)[-0.924,0] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jul 2019 11:55:06 -0000 MT> > MT> In the meantime you can deal with the buggy hosts by disabling the timestamps MT> > MT> or dropping extensions on SYN retransmits. MT> > MT> > You meen by some code changes? MT> No. MT> MT> Two options: MT> MT> Option 1: Drop the TCP timestamp option on the third retransmission MT> To enable this, you configure on the client MT> sudo sysctl -w net.inet.tcp.rexmit_drop_options=1 MT> or put MT> net.inet.tcp.rexmit_drop_options=1 MT> in /etc/sysctl.conf MT> and reboot MT> In case of the broken host, the first SYN retransmission will happen 1 second after the MT> initial SYN segment, the second retransmission will happen 1.2 seconds after the first. On the MT> third retransmission, which happens again 1.2 seconds later, the TCP timestamp option is MT> dropped and the connection setup will succeed. This gives you a total delay of 3.4 seconds MT> on connection setup instead of the longer timeout. First Option is not working. Steel see same behave. MT> MT> Option 2: Disable the TCP timestamps (and window scaling) MT> To enable this, you configure on the client MT> sudo sysctl -w net.inet.tcp.rfc1323=0 MT> or put MT> net.inet.tcp.rfc1323=0 MT> in /etc/sysctl.conf MT> and reboot. MT> This disables the timestamp option and window scaling completely. This allows you to MT> setup the connections without any delay. However, you don't have the benefits of the MT> extension. MT> MT> Both options don't require any code changes. This option was tested some time before. Yep it's help. But overal performance of tcp networking ... Let's say to bad :( MT> Best regards MT> Michael MT> MT> MT> > MT> > MT> > MT> MT> > MT> Best regards MT> > MT> Michael MT> > MT> > MT> > MT> > MT> > MT> > MT> > MT> > Michael Tuexen wrote: MT> > MT> > MT> MT> > MT> > MT> MT> > MT> > MT> > On 9. Jul 2019, at 14:58, Paul wrote: MT> > MT> > MT> > MT> > MT> > MT> > Hi Michael, MT> > MT> > MT> > MT> > MT> > MT> > 9 July 2019, 15:34:29, by "Michael Tuexen" : MT> > MT> > MT> > MT> > MT> > MT> >> MT> > MT> > MT> >> MT> > MT> > MT> >>> On 8. Jul 2019, at 17:22, Paul wrote: MT> > MT> > MT> >>> MT> > MT> > MT> >>> MT> > MT> > MT> >>> MT> > MT> > MT> >>> 8 July 2019, 17:12:21, by "Michael Tuexen" : MT> > MT> > MT> >>> MT> > MT> > MT> >>>>> On 8. Jul 2019, at 15:24, Paul wrote: MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>> Hi Michael, MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>> 8 July 2019, 15:53:15, by "Michael Tuexen" : MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>>>> On 8. Jul 2019, at 12:37, Paul wrote: MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> Hi team, MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> Recently we had an upgrade to 12 Stable. Immediately after, we have started MT> > MT> > MT> >>>>>>> seeing some strange connection establishment timeouts to some fixed number MT> > MT> > MT> >>>>>>> of external (world) hosts. The issue was persistent and easy to reproduce. MT> > MT> > MT> >>>>>>> Thanks to a patience and dedication of our system engineer we have tracked MT> > MT> > MT> >>>>>>> this issue down to a specific commit: MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> https://svnweb.freebsd.org/base?view=revision&revision=338053 MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> This patch was also back-ported into 11 Stable: MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> https://svnweb.freebsd.org/base?view=revision&revision=348435 MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> Among other things this patch changes the timestamp allocation strategy, MT> > MT> > MT> >>>>>>> by introducing a deterministic randomness via a hash function that takes MT> > MT> > MT> >>>>>>> into account a random key as well as source address, source port, dest MT> > MT> > MT> >>>>>>> address and dest port. As the result, timestamp offsets of different MT> > MT> > MT> >>>>>>> tuples (SA,SP,DA,DP) will be wildly different and will jump from small MT> > MT> > MT> >>>>>>> to large numbers and back, as long as something in the tuple changes. MT> > MT> > MT> >>>>>> Hi Paul, MT> > MT> > MT> >>>>>> MT> > MT> > MT> >>>>>> this is correct. MT> > MT> > MT> >>>>>> MT> > MT> > MT> >>>>>> Please note that the same happens with the old method, if two hosts with MT> > MT> > MT> >>>>>> different uptimes are bind a consumer grade NAT. MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>> If NAT does not replace timestamps then yes, it should be the case. MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> After performing various tests of hosts that produce the above mentioned MT> > MT> > MT> >>>>>>> issue we came to conclusion that there are some interesting implementations MT> > MT> > MT> >>>>>>> that drop SYN packets with timestamps smaller than the largest timestamp MT> > MT> > MT> >>>>>>> value from streams of all recent or current connections from a specific MT> > MT> > MT> >>>>>>> address. This looks as some kind of SYN flood protection. MT> > MT> > MT> >>>>>> This also breaks multiple hosts with different uptimes behind a consumer MT> > MT> > MT> >>>>>> level NAT talking to such a server. MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> To ensure that each external host is not going to see a wild jumps of MT> > MT> > MT> >>>>>>> timestamp values I propose a patch that removes ports from the equation MT> > MT> > MT> >>>>>>> all together, when calculating the timestamp offset: MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> Index: sys/netinet/tcp_subr.c MT> > MT> > MT> >>>>>>> =================================================================== MT> > MT> > MT> >>>>>>> --- sys/netinet/tcp_subr.c (revision 348435) MT> > MT> > MT> >>>>>>> +++ sys/netinet/tcp_subr.c (working copy) MT> > MT> > MT> >>>>>>> @@ -2224,7 +2224,22 @@ MT> > MT> > MT> >>>>>>> uint32_t MT> > MT> > MT> >>>>>>> tcp_new_ts_offset(struct in_conninfo *inc) MT> > MT> > MT> >>>>>>> { MT> > MT> > MT> >>>>>>> - return (tcp_keyed_hash(inc, V_ts_offset_secret)); MT> > MT> > MT> >>>>>>> + /* MT> > MT> > MT> >>>>>>> + * Some implementations show a strange behaviour when a wildly random MT> > MT> > MT> >>>>>>> + * timestamps allocated for different streams. It seems that only the MT> > MT> > MT> >>>>>>> + * SYN packets are affected. Observed implementations drop SYN packets MT> > MT> > MT> >>>>>>> + * with timestamps smaller than the largest timestamp value of all MT> > MT> > MT> >>>>>>> + * recent or current connections from specific a address. To mitigate MT> > MT> > MT> >>>>>>> + * this we are going to ensure that each host will always observe MT> > MT> > MT> >>>>>>> + * timestamps as increasing no matter the stream: by dropping ports MT> > MT> > MT> >>>>>>> + * from the equation. MT> > MT> > MT> >>>>>>> + */ MT> > MT> > MT> >>>>>>> + struct in_conninfo inc_copy = *inc; MT> > MT> > MT> >>>>>>> + MT> > MT> > MT> >>>>>>> + inc_copy.inc_fport = 0; MT> > MT> > MT> >>>>>>> + inc_copy.inc_lport = 0; MT> > MT> > MT> >>>>>>> + MT> > MT> > MT> >>>>>>> + return (tcp_keyed_hash(&inc_copy, V_ts_offset_secret)); MT> > MT> > MT> >>>>>>> } MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> /* MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> In any case, the solution of the uptime leak, implemented in rev338053 is MT> > MT> > MT> >>>>>>> not going to suffer, because a supposed attacker is currently able to use MT> > MT> > MT> >>>>>>> any fixed values of SP and DP, albeit not 0, anyway, to remove them out MT> > MT> > MT> >>>>>>> of the equation. MT> > MT> > MT> >>>>>> Can you describe how a peer can compute the uptime from two observed timestamps? MT> > MT> > MT> >>>>>> I don't see how you can do that... MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>> Supposed attacker could run a script that continuously monitors timestamps, MT> > MT> > MT> >>>>> for example via a periodic TCP connection from a fixed local port (eg 12345) MT> > MT> > MT> >>>>> and a fixed local address to the fixed victim's address and port (eg 80). MT> > MT> > MT> >>>>> Whenever large discrepancy is observed, attacker can assume that reboot has MT> > MT> > MT> >>>>> happened (due to V_ts_offset_secret re-generation), hence the received MT> > MT> > MT> >>>>> timestamp is considered an approximate point of reboot from which the uptime MT> > MT> > MT> >>>>> can be calculated, until the next reboot and so on. MT> > MT> > MT> >>>> Ahh, I see. The patch we are talking about is not intended to protect against MT> > MT> > MT> >>>> continuous monitoring, which is something you can always do. You could even MT> > MT> > MT> >>>> watch for service availability and detect reboots. A change of the local key MT> > MT> > MT> >>>> would also look similar to a reboot without a temporary loss of connectivity. MT> > MT> > MT> >>>> MT> > MT> > MT> >>>> Thanks for the clarification. MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> There is the list of example hosts that we were able to reproduce the MT> > MT> > MT> >>>>>>> issue with: MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> curl -v http://88.99.60.171:80 MT> > MT> > MT> >>>>>>> curl -v http://163.172.71.252:80 MT> > MT> > MT> >>>>>>> curl -v http://5.9.242.150:80 MT> > MT> > MT> >>>>>>> curl -v https://185.134.205.105:443 MT> > MT> > MT> >>>>>>> curl -v https://136.243.1.231:443 MT> > MT> > MT> >>>>>>> curl -v https://144.76.196.4:443 MT> > MT> > MT> >>>>>>> curl -v http://94.127.191.194:80 MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> To reproduce, call curl repeatedly with a same URL some number of times. MT> > MT> > MT> >>>>>>> You are going to see some of the requests stuck in MT> > MT> > MT> >>>>>>> `* Trying XXX.XXX.XXX.XXX...` MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> For some reason, the easiest way to reproduce the issue is with nc: MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> $ echo "foooooo" | nc -v 88.99.60.171 80 MT> > MT> > MT> >>>>>>> MT> > MT> > MT> >>>>>>> Only a few such calls are required until one of them is stuck on connect(): MT> > MT> > MT> >>>>>>> issuing SYN packets with an exponential backoff. MT> > MT> > MT> >>>>>> Thanks for providing an end-point to test with. I'll take a look. MT> > MT> > MT> >>>>>> Just to be clear: You are running a FreeBSD client against one of the above MT> > MT> > MT> >>>>>> servers and experience the problem with the new timestamp computations. MT> > MT> > MT> >>>>>> MT> > MT> > MT> >>>>>> You are not running arbitrary clients against a FreeBSD server... MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>> We are talking about FreeBSD being the client. Peers that yield this unwanted MT> > MT> > MT> >>>>> behaviour are unknown. Little bit of tinkering showed that some of them run MT> > MT> > MT> >>>>> Debian: MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>> telnet 88.99.60.171 22 MT> > MT> > MT> >>>>> Trying 88.99.60.171... MT> > MT> > MT> >>>>> Connected to 88.99.60.171. MT> > MT> > MT> >>>>> Escape character is '^]'. MT> > MT> > MT> >>>>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 MT> > MT> > MT> >>>> Also some are hosted by Hetzner, but not all. I'll will look into MT> > MT> > MT> >>>> this tomorrow, since I'm on a deadline today (well it is 2am tomorrow MT> > MT> > MT> >>>> morning, to be precise)... MT> > MT> > MT> >>> MT> > MT> > MT> >>> Thanks a lot, I would appreciate that. MT> > MT> > MT> >> Hi Paul, MT> > MT> > MT> >> MT> > MT> > MT> >> I have looked into this. MT> > MT> > MT> >> MT> > MT> > MT> >> * The FreeBSD behaviour is the one which is specified in the last bullet item MT> > MT> > MT> >> in https://tools.ietf.org/html/rfc7323#section-5.4 MT> > MT> > MT> >> It is also the one, which is RECOMMENDED in MT> > MT> > MT> >> https://tools.ietf.org/html/rfc7323#section-7.1 MT> > MT> > MT> >> MT> > MT> > MT> >> * My NAT box (a popular one in Germany) does NOT rewrite TCP timestamps. MT> > MT> > MT> >> MT> > MT> > MT> >> This means that the host you are referring to have some sort of protection, MT> > MT> > MT> >> which makes incorrect assumptions. It will also break multiple hosts behind MT> > MT> > MT> >> a NAT. MT> > MT> > MT> >> MT> > MT> > MT> >> I can run MT> > MT> > MT> >> curl -v http://88.99.60.171:80 MT> > MT> > MT> >> in a loop without any problems from a FreeBSD head system. I tested 1000 MT> > MT> > MT> >> iterations or so. The TS.val is jumping up and down as expected. MT> > MT> > MT> >> I'm wondering why you are observing errors in this case, too. MT> > MT> > MT> >> MT> > MT> > MT> >> However, doing something like MT> > MT> > MT> >> echo "foooooo" | nc -v 88.99.60.171 80 MT> > MT> > MT> >> triggers the problem. MT> > MT> > MT> >> MT> > MT> > MT> >> So I think there is some functionality (in a middlebox or running on the host), MT> > MT> > MT> >> which incorrectly assume monotonic timestamps between multiple TCP connections MT> > MT> > MT> >> coming from the same IP address, but only in case of errors at the application layer. MT> > MT> > MT> > MT> > MT> > MT> > Yeah, exactly, some hosts seem to enable this only in case of an error in HTTP MT> > MT> > MT> > communication (some smart proxy?). However, there are some that behave this way MT> > MT> > MT> > regardless of errors, for example these: MT> > MT> > MT> > MT> > MT> > MT> > curl -v https://185.134.205.105:443 MT> > MT> > MT> > curl -v https://136.243.1.231:443 MT> > MT> > MT> Wireshark sees an Encrypted Alert in both cases. So I guess this is another indication MT> > MT> > MT> of "error at the application layer". MT> > MT> > MT> > MT> > MT> > MT> >> MT> > MT> > MT> >> Do you have any insights whether the hosts you are listed share something in MT> > MT> > MT> >> common. Some of them are hosted by Hetzner, but not all. MT> > MT> > MT> > MT> > MT> > MT> > Nope. A whole set of endpoints that we have detected so far is pretty diverse, MT> > MT> > MT> > containing a lot of different locations geographically, as well as different MT> > MT> > MT> > hosters. MT> > MT> > MT> OK. Thanks for the clarification. MT> > MT> > MT> > MT> > MT> > MT> >> MT> > MT> > MT> >> I think in general, it is the correct thing to include the port numbers in MT> > MT> > MT> >> the offset computation. We might add a sysctl variable to control the inclusion. MT> > MT> > MT> >> This would allow interworking with broken middleboxes. MT> > MT> > MT> > MT> > MT> > MT> > Yeah, I completely agree that these rare cases should not dictate the implementation. MT> > MT> > MT> > But an ability to enable a work-around via sysctl would be greatly appreciated. MT> > MT> > MT> > Currently we are unable to roll-out the upgrade across all servers because of this MT> > MT> > MT> > issue: even though it happens not so often, a lot of requests from our users MT> > MT> > MT> > get stuck or fail all together. For example, a host 185.134.205.105 is a kind of MT> > MT> > MT> > social network that our proxy servers connect to so securely access to content, MT> > MT> > MT> > such as images, on behalf of our users. MT> > MT> > MT> > MT> > MT> > MT> >> MT> > MT> > MT> >> Please note, this does not fix the case of multiple clients behind a NAT. MT> > MT> > MT> > MT> > MT> > MT> > Yeah, that's true. Fortunately we don't use NAT. MT> > MT> > MT> > MT> > MT> > MT> >> MT> > MT> > MT> >> I'm also trying to figure out how and why Linux and Windows are handling this. MT> > MT> > MT> > MT> > MT> > MT> > Thanks for bothering! MT> > MT> > MT> Will let you know what I figure out. MT> > MT> > MT> MT> > MT> > MT> Best regards MT> > MT> > MT> Michael MT> > MT> > MT> > MT> > MT> > MT> >> MT> > MT> > MT> >> Best regards MT> > MT> > MT> >> Michael MT> > MT> > MT> >> MT> > MT> > MT> >>> MT> > MT> > MT> >>>> MT> > MT> > MT> >>>> Best regards MT> > MT> > MT> >>>> Michael MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>> MT> > MT> > MT> >>>>>> MT> > MT> > MT> >>>>>> Best regards MT> > MT> > MT> >>>>>> Michael MT> > MT> > MT> >>>>>> MT> > MT> > MT> >>>>>> MT> > MT> > MT> >>>> MT> > MT> > MT> >>>> MT> > MT> > MT> >> MT> > MT> > MT> >> MT> > MT> > MT> MT> > MT> > MT> _______________________________________________ MT> > MT> > MT> freebsd-net@freebsd.org mailing list MT> > MT> > MT> https://lists.freebsd.org/mailman/listinfo/freebsd-net MT> > MT> > MT> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" MT> > MT> MT> > MT> _______________________________________________ MT> > MT> freebsd-net@freebsd.org mailing list MT> > MT> https://lists.freebsd.org/mailman/listinfo/freebsd-net MT> > MT> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" MT>