From owner-freebsd-net@freebsd.org Wed Jul 17 11:03:02 2019 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 15210A991D for ; Wed, 17 Jul 2019 11:03:02 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D4C8A73B26 for ; Wed, 17 Jul 2019 11:03:01 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from [IPv6:2a02:8109:1140:c3d:7843:89ae:ec92:371] (unknown [IPv6:2a02:8109:1140:c3d:7843:89ae:ec92:371]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTPSA id 277EA72106C25; Wed, 17 Jul 2019 13:02:57 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: Issues with TCP Timestamps allocation From: Michael Tuexen In-Reply-To: <20190717100926.GA24984@hell.ukr.net> Date: Wed, 17 Jul 2019 13:02:56 +0200 Cc: freebsd-net@freebsd.org, Paul Content-Transfer-Encoding: quoted-printable Message-Id: <48817BF6-AEDD-4D28-95F8-A4D53E4999B1@freebsd.org> References: <1562579483.67527000.24rw4xi5@frv39.fwdcdn.com> <32FD061B-245C-41D2-81DE-1B4756A7173D@freebsd.org> <1562591379.369129000.gpmxvurq@frv39.fwdcdn.com> <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com> <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org> <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com> <1E9F3F99-C3E9-44DD-AA70-9B11E19D4769@freebsd.org> <20190717074243.GA65665@hell.ukr.net> <20190717100926.GA24984@hell.ukr.net> To: Vitalij Satanivskij X-Mailer: Apple Mail (2.3445.104.11) X-Spam-Status: No, score=-1.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jul 2019 11:03:02 -0000 > On 17. Jul 2019, at 12:09, Vitalij Satanivskij wrote: >=20 > Hello again >=20 > Michael Tuexen wrote: > MT> > On 17. Jul 2019, at 09:42, Vitalij Satanivskij = wrote: > MT> >=20 > MT> >=20 > MT> >=20 > MT> > Hello.=20 > MT> >=20 > MT> > Is there any changes about this problem > MT> >=20 > MT> >=20 > MT> > I'm using FreeBSD 12 on my desktop and can confirm problem occur = with some hosts. > MT> Can you provide a list of some of these hosts? > MT> I'll put up a change for review later today. >=20 >=20 > Here some hosts. >=20 > 5.9.242.150 https://vitagramma.com > 77.120.8.194 https://volia.com > 31.41.220.92 https://moemisto.ua > 185.5.72.33 https://fotostrana.ru OK, thanks. That might help to figure out what is broken exactly. I'm = not yet sure if it is a broken end point implementation or a middlebox making false = assumptions. >=20 > Problem can be seen by sending curl request to hosts in serial = (manual, so delay it's from few msec to few sec) >=20 > Or by using proxy on machine with parallel/serial request's (eq squid = or reverse proxy in nginx) >=20 > On system before = https://svnweb.freebsd.org/base?view=3Drevision&revision=3D338053 such = behavior not seen. >=20 > MT>=20 > MT> In the meantime you can deal with the buggy hosts by disabling the = timestamps > MT> or dropping extensions on SYN retransmits. >=20 > You meen by some code changes? No. Two options: Option 1: Drop the TCP timestamp option on the third retransmission To enable this, you configure on the client sudo sysctl -w net.inet.tcp.rexmit_drop_options=3D1 or put net.inet.tcp.rexmit_drop_options=3D1 in /etc/sysctl.conf and reboot In case of the broken host, the first SYN retransmission will happen 1 = second after the initial SYN segment, the second retransmission will happen 1.2 seconds = after the first. On the third retransmission, which happens again 1.2 seconds later, the TCP = timestamp option is dropped and the connection setup will succeed. This gives you a total = delay of 3.4 seconds on connection setup instead of the longer timeout. Option 2: Disable the TCP timestamps (and window scaling) To enable this, you configure on the client sudo sysctl -w net.inet.tcp.rfc1323=3D0 or put net.inet.tcp.rfc1323=3D0 in /etc/sysctl.conf and reboot. This disables the timestamp option and window scaling completely. This = allows you to setup the connections without any delay. However, you don't have the = benefits of the extension. Both options don't require any code changes. Best regards Michael >=20 >=20 > MT>=20 > MT> Best regards > MT> Michael > MT> >=20 > MT> >=20 > MT> >=20 > MT> > Michael Tuexen wrote: > MT> > MT>=20 > MT> > MT>=20 > MT> > MT> > On 9. Jul 2019, at 14:58, Paul wrote: > MT> > MT> >=20 > MT> > MT> > Hi Michael, > MT> > MT> >=20 > MT> > MT> > 9 July 2019, 15:34:29, by "Michael Tuexen" = : > MT> > MT> >=20 > MT> > MT> >>=20 > MT> > MT> >>=20 > MT> > MT> >>> On 8. Jul 2019, at 17:22, Paul wrote: > MT> > MT> >>>=20 > MT> > MT> >>>=20 > MT> > MT> >>>=20 > MT> > MT> >>> 8 July 2019, 17:12:21, by "Michael Tuexen" = : > MT> > MT> >>>=20 > MT> > MT> >>>>> On 8. Jul 2019, at 15:24, Paul wrote: > MT> > MT> >>>>>=20 > MT> > MT> >>>>> Hi Michael, > MT> > MT> >>>>>=20 > MT> > MT> >>>>> 8 July 2019, 15:53:15, by "Michael Tuexen" = : > MT> > MT> >>>>>=20 > MT> > MT> >>>>>>> On 8. Jul 2019, at 12:37, Paul = wrote: > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> Hi team, > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> Recently we had an upgrade to 12 Stable. Immediately = after, we have started=20 > MT> > MT> >>>>>>> seeing some strange connection establishment = timeouts to some fixed number > MT> > MT> >>>>>>> of external (world) hosts. The issue was persistent = and easy to reproduce. > MT> > MT> >>>>>>> Thanks to a patience and dedication of our system = engineer we have tracked =20 > MT> > MT> >>>>>>> this issue down to a specific commit: > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> = https://svnweb.freebsd.org/base?view=3Drevision&revision=3D338053 > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> This patch was also back-ported into 11 Stable: > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> = https://svnweb.freebsd.org/base?view=3Drevision&revision=3D348435 > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> Among other things this patch changes the timestamp = allocation strategy, > MT> > MT> >>>>>>> by introducing a deterministic randomness via a hash = function that takes > MT> > MT> >>>>>>> into account a random key as well as source address, = source port, dest > MT> > MT> >>>>>>> address and dest port. As the result, timestamp = offsets of different > MT> > MT> >>>>>>> tuples (SA,SP,DA,DP) will be wildly different and = will jump from small=20 > MT> > MT> >>>>>>> to large numbers and back, as long as something in = the tuple changes. > MT> > MT> >>>>>> Hi Paul, > MT> > MT> >>>>>>=20 > MT> > MT> >>>>>> this is correct. > MT> > MT> >>>>>>=20 > MT> > MT> >>>>>> Please note that the same happens with the old = method, if two hosts with > MT> > MT> >>>>>> different uptimes are bind a consumer grade NAT. > MT> > MT> >>>>>=20 > MT> > MT> >>>>> If NAT does not replace timestamps then yes, it should = be the case. > MT> > MT> >>>>>=20 > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> After performing various tests of hosts that produce = the above mentioned=20 > MT> > MT> >>>>>>> issue we came to conclusion that there are some = interesting implementations=20 > MT> > MT> >>>>>>> that drop SYN packets with timestamps smaller than = the largest timestamp=20 > MT> > MT> >>>>>>> value from streams of all recent or current = connections from a specific=20 > MT> > MT> >>>>>>> address. This looks as some kind of SYN flood = protection. > MT> > MT> >>>>>> This also breaks multiple hosts with different = uptimes behind a consumer > MT> > MT> >>>>>> level NAT talking to such a server. > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> To ensure that each external host is not going to = see a wild jumps of=20 > MT> > MT> >>>>>>> timestamp values I propose a patch that removes = ports from the equation > MT> > MT> >>>>>>> all together, when calculating the timestamp offset: > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> Index: sys/netinet/tcp_subr.c > MT> > MT> >>>>>>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > MT> > MT> >>>>>>> --- sys/netinet/tcp_subr.c (revision 348435) > MT> > MT> >>>>>>> +++ sys/netinet/tcp_subr.c (working copy) > MT> > MT> >>>>>>> @@ -2224,7 +2224,22 @@ > MT> > MT> >>>>>>> uint32_t > MT> > MT> >>>>>>> tcp_new_ts_offset(struct in_conninfo *inc) > MT> > MT> >>>>>>> { > MT> > MT> >>>>>>> - return (tcp_keyed_hash(inc, = V_ts_offset_secret)); > MT> > MT> >>>>>>> + /*=20 > MT> > MT> >>>>>>> + * Some implementations show a strange = behaviour when a wildly random=20 > MT> > MT> >>>>>>> + * timestamps allocated for different = streams. It seems that only the > MT> > MT> >>>>>>> + * SYN packets are affected. Observed = implementations drop SYN packets > MT> > MT> >>>>>>> + * with timestamps smaller than the largest = timestamp value of all=20 > MT> > MT> >>>>>>> + * recent or current connections from = specific a address. To mitigate=20 > MT> > MT> >>>>>>> + * this we are going to ensure that each = host will always observe=20 > MT> > MT> >>>>>>> + * timestamps as increasing no matter the = stream: by dropping ports > MT> > MT> >>>>>>> + * from the equation. > MT> > MT> >>>>>>> + */=20 > MT> > MT> >>>>>>> + struct in_conninfo inc_copy =3D *inc; > MT> > MT> >>>>>>> + > MT> > MT> >>>>>>> + inc_copy.inc_fport =3D 0; > MT> > MT> >>>>>>> + inc_copy.inc_lport =3D 0; > MT> > MT> >>>>>>> + > MT> > MT> >>>>>>> + return (tcp_keyed_hash(&inc_copy, = V_ts_offset_secret)); > MT> > MT> >>>>>>> } > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> /* > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> In any case, the solution of the uptime leak, = implemented in rev338053 is=20 > MT> > MT> >>>>>>> not going to suffer, because a supposed attacker is = currently able to use=20 > MT> > MT> >>>>>>> any fixed values of SP and DP, albeit not 0, anyway, = to remove them out=20 > MT> > MT> >>>>>>> of the equation. > MT> > MT> >>>>>> Can you describe how a peer can compute the uptime = from two observed timestamps? > MT> > MT> >>>>>> I don't see how you can do that... > MT> > MT> >>>>>=20 > MT> > MT> >>>>> Supposed attacker could run a script that continuously = monitors timestamps, > MT> > MT> >>>>> for example via a periodic TCP connection from a fixed = local port (eg 12345)=20 > MT> > MT> >>>>> and a fixed local address to the fixed victim's = address and port (eg 80). > MT> > MT> >>>>> Whenever large discrepancy is observed, attacker can = assume that reboot has=20 > MT> > MT> >>>>> happened (due to V_ts_offset_secret re-generation), = hence the received=20 > MT> > MT> >>>>> timestamp is considered an approximate point of reboot = from which the uptime > MT> > MT> >>>>> can be calculated, until the next reboot and so on. > MT> > MT> >>>> Ahh, I see. The patch we are talking about is not = intended to protect against > MT> > MT> >>>> continuous monitoring, which is something you can = always do. You could even > MT> > MT> >>>> watch for service availability and detect reboots. A = change of the local key > MT> > MT> >>>> would also look similar to a reboot without a temporary = loss of connectivity. > MT> > MT> >>>>=20 > MT> > MT> >>>> Thanks for the clarification. > MT> > MT> >>>>>=20 > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> There is the list of example hosts that we were able = to reproduce the=20 > MT> > MT> >>>>>>> issue with: > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> curl -v http://88.99.60.171:80 > MT> > MT> >>>>>>> curl -v http://163.172.71.252:80 > MT> > MT> >>>>>>> curl -v http://5.9.242.150:80 > MT> > MT> >>>>>>> curl -v https://185.134.205.105:443 > MT> > MT> >>>>>>> curl -v https://136.243.1.231:443 > MT> > MT> >>>>>>> curl -v https://144.76.196.4:443 > MT> > MT> >>>>>>> curl -v http://94.127.191.194:80 > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> To reproduce, call curl repeatedly with a same URL = some number of times.=20 > MT> > MT> >>>>>>> You are going to see some of the requests stuck in=20= > MT> > MT> >>>>>>> `* Trying XXX.XXX.XXX.XXX...` > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> For some reason, the easiest way to reproduce the = issue is with nc: > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> $ echo "foooooo" | nc -v 88.99.60.171 80 > MT> > MT> >>>>>>>=20 > MT> > MT> >>>>>>> Only a few such calls are required until one of them = is stuck on connect(): > MT> > MT> >>>>>>> issuing SYN packets with an exponential backoff. > MT> > MT> >>>>>> Thanks for providing an end-point to test with. I'll = take a look. > MT> > MT> >>>>>> Just to be clear: You are running a FreeBSD client = against one of the above > MT> > MT> >>>>>> servers and experience the problem with the new = timestamp computations. > MT> > MT> >>>>>>=20 > MT> > MT> >>>>>> You are not running arbitrary clients against a = FreeBSD server... > MT> > MT> >>>>>=20 > MT> > MT> >>>>> We are talking about FreeBSD being the client. Peers = that yield this unwanted > MT> > MT> >>>>> behaviour are unknown. Little bit of tinkering showed = that some of them run=20 > MT> > MT> >>>>> Debian: > MT> > MT> >>>>>=20 > MT> > MT> >>>>> telnet 88.99.60.171 22 > MT> > MT> >>>>> Trying 88.99.60.171... > MT> > MT> >>>>> Connected to 88.99.60.171. > MT> > MT> >>>>> Escape character is '^]'. > MT> > MT> >>>>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 > MT> > MT> >>>> Also some are hosted by Hetzner, but not all. I'll will = look into > MT> > MT> >>>> this tomorrow, since I'm on a deadline today (well it = is 2am tomorrow > MT> > MT> >>>> morning, to be precise)... > MT> > MT> >>>=20 > MT> > MT> >>> Thanks a lot, I would appreciate that. > MT> > MT> >> Hi Paul, > MT> > MT> >>=20 > MT> > MT> >> I have looked into this. > MT> > MT> >>=20 > MT> > MT> >> * The FreeBSD behaviour is the one which is specified in = the last bullet item > MT> > MT> >> in https://tools.ietf.org/html/rfc7323#section-5.4 > MT> > MT> >> It is also the one, which is RECOMMENDED in > MT> > MT> >> https://tools.ietf.org/html/rfc7323#section-7.1=20 > MT> > MT> >>=20 > MT> > MT> >> * My NAT box (a popular one in Germany) does NOT rewrite = TCP timestamps. > MT> > MT> >>=20 > MT> > MT> >> This means that the host you are referring to have some = sort of protection, > MT> > MT> >> which makes incorrect assumptions. It will also break = multiple hosts behind > MT> > MT> >> a NAT. > MT> > MT> >>=20 > MT> > MT> >> I can run > MT> > MT> >> curl -v http://88.99.60.171:80 > MT> > MT> >> in a loop without any problems from a FreeBSD head = system. I tested 1000 > MT> > MT> >> iterations or so. The TS.val is jumping up and down as = expected. > MT> > MT> >> I'm wondering why you are observing errors in this case, = too. > MT> > MT> >>=20 > MT> > MT> >> However, doing something like > MT> > MT> >> echo "foooooo" | nc -v 88.99.60.171 80 > MT> > MT> >> triggers the problem. > MT> > MT> >>=20 > MT> > MT> >> So I think there is some functionality (in a middlebox or = running on the host), > MT> > MT> >> which incorrectly assume monotonic timestamps between = multiple TCP connections > MT> > MT> >> coming from the same IP address, but only in case of = errors at the application layer. > MT> > MT> >=20 > MT> > MT> > Yeah, exactly, some hosts seem to enable this only in case = of an error in HTTP > MT> > MT> > communication (some smart proxy?). However, there are some = that behave this way > MT> > MT> > regardless of errors, for example these: > MT> > MT> >=20 > MT> > MT> > curl -v https://185.134.205.105:443 > MT> > MT> > curl -v https://136.243.1.231:443 > MT> > MT> Wireshark sees an Encrypted Alert in both cases. So I guess = this is another indication > MT> > MT> of "error at the application layer". > MT> > MT> >=20 > MT> > MT> >>=20 > MT> > MT> >> Do you have any insights whether the hosts you are listed = share something in > MT> > MT> >> common. Some of them are hosted by Hetzner, but not all. > MT> > MT> >=20 > MT> > MT> > Nope. A whole set of endpoints that we have detected so = far is pretty diverse, > MT> > MT> > containing a lot of different locations geographically, as = well as different > MT> > MT> > hosters. > MT> > MT> OK. Thanks for the clarification. > MT> > MT> >=20 > MT> > MT> >>=20 > MT> > MT> >> I think in general, it is the correct thing to include = the port numbers in > MT> > MT> >> the offset computation. We might add a sysctl variable to = control the inclusion. > MT> > MT> >> This would allow interworking with broken middleboxes. > MT> > MT> >=20 > MT> > MT> > Yeah, I completely agree that these rare cases should not = dictate the implementation. > MT> > MT> > But an ability to enable a work-around via sysctl would be = greatly appreciated. > MT> > MT> > Currently we are unable to roll-out the upgrade across all = servers because of this > MT> > MT> > issue: even though it happens not so often, a lot of = requests from our users=20 > MT> > MT> > get stuck or fail all together. For example, a host = 185.134.205.105 is a kind of > MT> > MT> > social network that our proxy servers connect to so = securely access to content, > MT> > MT> > such as images, on behalf of our users. > MT> > MT> >=20 > MT> > MT> >>=20 > MT> > MT> >> Please note, this does not fix the case of multiple = clients behind a NAT. > MT> > MT> >=20 > MT> > MT> > Yeah, that's true. Fortunately we don't use NAT. > MT> > MT> >=20 > MT> > MT> >>=20 > MT> > MT> >> I'm also trying to figure out how and why Linux and = Windows are handling this. > MT> > MT> >=20 > MT> > MT> > Thanks for bothering! > MT> > MT> Will let you know what I figure out. > MT> > MT>=20 > MT> > MT> Best regards > MT> > MT> Michael > MT> > MT> >=20 > MT> > MT> >>=20 > MT> > MT> >> Best regards > MT> > MT> >> Michael > MT> > MT> >>=20 > MT> > MT> >>>=20 > MT> > MT> >>>>=20 > MT> > MT> >>>> Best regards > MT> > MT> >>>> Michael=20 > MT> > MT> >>>>>=20 > MT> > MT> >>>>>=20 > MT> > MT> >>>>>>=20 > MT> > MT> >>>>>> Best regards > MT> > MT> >>>>>> Michael > MT> > MT> >>>>>>=20 > MT> > MT> >>>>>>=20 > MT> > MT> >>>>=20 > MT> > MT> >>>>=20 > MT> > MT> >>=20 > MT> > MT> >>=20 > MT> > MT>=20 > MT> > MT> _______________________________________________ > MT> > MT> freebsd-net@freebsd.org mailing list > MT> > MT> https://lists.freebsd.org/mailman/listinfo/freebsd-net > MT> > MT> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" > MT>=20 > MT> _______________________________________________ > MT> freebsd-net@freebsd.org mailing list > MT> https://lists.freebsd.org/mailman/listinfo/freebsd-net > MT> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org"