From owner-freebsd-net@freebsd.org Tue Jul 9 12:34:31 2019 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC4FB15D9945 for ; Tue, 9 Jul 2019 12:34:30 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 62B738F039 for ; Tue, 9 Jul 2019 12:34:30 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from [IPv6:2a02:8109:1140:c3d:1c7:520c:e8a0:1e52] (unknown [IPv6:2a02:8109:1140:c3d:1c7:520c:e8a0:1e52]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTPSA id 6847D71E3F933; Tue, 9 Jul 2019 14:34:25 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: Issues with TCP Timestamps allocation From: Michael Tuexen In-Reply-To: <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com> Date: Tue, 9 Jul 2019 14:34:24 +0200 Cc: freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org> References: <1562579483.67527000.24rw4xi5@frv39.fwdcdn.com> <32FD061B-245C-41D2-81DE-1B4756A7173D@freebsd.org> <1562591379.369129000.gpmxvurq@frv39.fwdcdn.com> <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com> To: Paul X-Mailer: Apple Mail (2.3445.104.11) X-Spam-Status: No, score=-1.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2019 12:34:31 -0000 > On 8. Jul 2019, at 17:22, Paul wrote: >=20 >=20 >=20 > 8 July 2019, 17:12:21, by "Michael Tuexen" : >=20 >>> On 8. Jul 2019, at 15:24, Paul wrote: >>>=20 >>> Hi Michael, >>>=20 >>> 8 July 2019, 15:53:15, by "Michael Tuexen" : >>>=20 >>>>> On 8. Jul 2019, at 12:37, Paul wrote: >>>>>=20 >>>>> Hi team, >>>>>=20 >>>>> Recently we had an upgrade to 12 Stable. Immediately after, we = have started=20 >>>>> seeing some strange connection establishment timeouts to some = fixed number >>>>> of external (world) hosts. The issue was persistent and easy to = reproduce. >>>>> Thanks to a patience and dedication of our system engineer we have = tracked =20 >>>>> this issue down to a specific commit: >>>>>=20 >>>>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D338053 >>>>>=20 >>>>> This patch was also back-ported into 11 Stable: >>>>>=20 >>>>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D348435 >>>>>=20 >>>>> Among other things this patch changes the timestamp allocation = strategy, >>>>> by introducing a deterministic randomness via a hash function that = takes >>>>> into account a random key as well as source address, source port, = dest >>>>> address and dest port. As the result, timestamp offsets of = different >>>>> tuples (SA,SP,DA,DP) will be wildly different and will jump from = small=20 >>>>> to large numbers and back, as long as something in the tuple = changes. >>>> Hi Paul, >>>>=20 >>>> this is correct. >>>>=20 >>>> Please note that the same happens with the old method, if two hosts = with >>>> different uptimes are bind a consumer grade NAT. >>>=20 >>> If NAT does not replace timestamps then yes, it should be the case. >>>=20 >>>>>=20 >>>>> After performing various tests of hosts that produce the above = mentioned=20 >>>>> issue we came to conclusion that there are some interesting = implementations=20 >>>>> that drop SYN packets with timestamps smaller than the largest = timestamp=20 >>>>> value from streams of all recent or current connections from a = specific=20 >>>>> address. This looks as some kind of SYN flood protection. >>>> This also breaks multiple hosts with different uptimes behind a = consumer >>>> level NAT talking to such a server. >>>>>=20 >>>>> To ensure that each external host is not going to see a wild jumps = of=20 >>>>> timestamp values I propose a patch that removes ports from the = equation >>>>> all together, when calculating the timestamp offset: >>>>>=20 >>>>> Index: sys/netinet/tcp_subr.c >>>>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> --- sys/netinet/tcp_subr.c (revision 348435) >>>>> +++ sys/netinet/tcp_subr.c (working copy) >>>>> @@ -2224,7 +2224,22 @@ >>>>> uint32_t >>>>> tcp_new_ts_offset(struct in_conninfo *inc) >>>>> { >>>>> - return (tcp_keyed_hash(inc, V_ts_offset_secret)); >>>>> + /*=20 >>>>> + * Some implementations show a strange behaviour when a = wildly random=20 >>>>> + * timestamps allocated for different streams. It seems = that only the >>>>> + * SYN packets are affected. Observed implementations = drop SYN packets >>>>> + * with timestamps smaller than the largest timestamp = value of all=20 >>>>> + * recent or current connections from specific a address. = To mitigate=20 >>>>> + * this we are going to ensure that each host will always = observe=20 >>>>> + * timestamps as increasing no matter the stream: by = dropping ports >>>>> + * from the equation. >>>>> + */=20 >>>>> + struct in_conninfo inc_copy =3D *inc; >>>>> + >>>>> + inc_copy.inc_fport =3D 0; >>>>> + inc_copy.inc_lport =3D 0; >>>>> + >>>>> + return (tcp_keyed_hash(&inc_copy, V_ts_offset_secret)); >>>>> } >>>>>=20 >>>>> /* >>>>>=20 >>>>> In any case, the solution of the uptime leak, implemented in = rev338053 is=20 >>>>> not going to suffer, because a supposed attacker is currently able = to use=20 >>>>> any fixed values of SP and DP, albeit not 0, anyway, to remove = them out=20 >>>>> of the equation. >>>> Can you describe how a peer can compute the uptime from two = observed timestamps? >>>> I don't see how you can do that... >>>=20 >>> Supposed attacker could run a script that continuously monitors = timestamps, >>> for example via a periodic TCP connection from a fixed local port = (eg 12345)=20 >>> and a fixed local address to the fixed victim's address and port (eg = 80). >>> Whenever large discrepancy is observed, attacker can assume that = reboot has=20 >>> happened (due to V_ts_offset_secret re-generation), hence the = received=20 >>> timestamp is considered an approximate point of reboot from which = the uptime >>> can be calculated, until the next reboot and so on. >> Ahh, I see. The patch we are talking about is not intended to protect = against >> continuous monitoring, which is something you can always do. You = could even >> watch for service availability and detect reboots. A change of the = local key >> would also look similar to a reboot without a temporary loss of = connectivity. >>=20 >> Thanks for the clarification. >>>=20 >>>>>=20 >>>>> There is the list of example hosts that we were able to reproduce = the=20 >>>>> issue with: >>>>>=20 >>>>> curl -v http://88.99.60.171:80 >>>>> curl -v http://163.172.71.252:80 >>>>> curl -v http://5.9.242.150:80 >>>>> curl -v https://185.134.205.105:443 >>>>> curl -v https://136.243.1.231:443 >>>>> curl -v https://144.76.196.4:443 >>>>> curl -v http://94.127.191.194:80 >>>>>=20 >>>>> To reproduce, call curl repeatedly with a same URL some number of = times.=20 >>>>> You are going to see some of the requests stuck in=20 >>>>> `* Trying XXX.XXX.XXX.XXX...` >>>>>=20 >>>>> For some reason, the easiest way to reproduce the issue is with = nc: >>>>>=20 >>>>> $ echo "foooooo" | nc -v 88.99.60.171 80 >>>>>=20 >>>>> Only a few such calls are required until one of them is stuck on = connect(): >>>>> issuing SYN packets with an exponential backoff. >>>> Thanks for providing an end-point to test with. I'll take a look. >>>> Just to be clear: You are running a FreeBSD client against one of = the above >>>> servers and experience the problem with the new timestamp = computations. >>>>=20 >>>> You are not running arbitrary clients against a FreeBSD server... >>>=20 >>> We are talking about FreeBSD being the client. Peers that yield this = unwanted >>> behaviour are unknown. Little bit of tinkering showed that some of = them run=20 >>> Debian: >>>=20 >>> telnet 88.99.60.171 22 >>> Trying 88.99.60.171... >>> Connected to 88.99.60.171. >>> Escape character is '^]'. >>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 >> Also some are hosted by Hetzner, but not all. I'll will look into >> this tomorrow, since I'm on a deadline today (well it is 2am tomorrow >> morning, to be precise)... >=20 > Thanks a lot, I would appreciate that. Hi Paul, I have looked into this. * The FreeBSD behaviour is the one which is specified in the last bullet = item in https://tools.ietf.org/html/rfc7323#section-5.4 It is also the one, which is RECOMMENDED in https://tools.ietf.org/html/rfc7323#section-7.1=20 * My NAT box (a popular one in Germany) does NOT rewrite TCP timestamps. This means that the host you are referring to have some sort of = protection, which makes incorrect assumptions. It will also break multiple hosts = behind a NAT. I can run curl -v http://88.99.60.171:80 in a loop without any problems from a FreeBSD head system. I tested 1000 iterations or so. The TS.val is jumping up and down as expected. I'm wondering why you are observing errors in this case, too. However, doing something like echo "foooooo" | nc -v 88.99.60.171 80 triggers the problem. So I think there is some functionality (in a middlebox or running on the = host), which incorrectly assume monotonic timestamps between multiple TCP = connections coming from the same IP address, but only in case of errors at the = application layer. Do you have any insights whether the hosts you are listed share = something in common. Some of them are hosted by Hetzner, but not all. I think in general, it is the correct thing to include the port numbers = in the offset computation. We might add a sysctl variable to control the = inclusion. This would allow interworking with broken middleboxes. Please note, this does not fix the case of multiple clients behind a = NAT. I'm also trying to figure out how and why Linux and Windows are handling = this. Best regards Michael >=20 >>=20 >> Best regards >> Michael=20 >>>=20 >>>=20 >>>>=20 >>>> Best regards >>>> Michael >>>>=20 >>>>=20 >>=20 >>=20