From owner-freebsd-net@freebsd.org  Tue Jul  9 12:59:03 2019
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A8E1515D9FF2
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Tue,  9 Jul 2019 12:59:03 +0000 (UTC) (envelope-from devgs@ukr.net)
Received: from frv198.fwdcdn.com (frv198.fwdcdn.com [212.42.77.198])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "*.ukr.net", Issuer "Thawte RSA CA 2018" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 392918FBC6
 for <freebsd-net@freebsd.org>; Tue,  9 Jul 2019 12:59:03 +0000 (UTC)
 (envelope-from devgs@ukr.net)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net;
 s=ffe; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-Id:
 References:In-Reply-To:Cc:To:Subject:From:Date:Sender:Reply-To:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
 List-Post:List-Owner:List-Archive;
 bh=20mxaSdIjlz3sc0lHcaH1DKfhhw9kFhA2WQWWqD40Jg=; b=hk5dpbfMvocKTsBVcZuHEh9Jui
 5NX2/0XRuecYQTxUg73wzoTO6GtdvSS0koGYmvzRB4E/DSkIx+UcpG2lIVUk7AT4grKFKRca4WqOq
 sflyLEIy09T13EjUUrcYzV01EJVC5qX/zTkgFzUlZfoPRr/vo/t9giXuqo5Ay8Nu087E=;
Received: from [10.10.10.39] (helo=frv39.fwdcdn.com)
 by frv198.fwdcdn.com with smtp ID 1hkphu-000EAo-KG
 for freebsd-net@freebsd.org; Tue, 09 Jul 2019 15:58:54 +0300
Date: Tue, 09 Jul 2019 15:58:54 +0300
From: Paul <devgs@ukr.net>
Subject: Re[2]: Issues with TCP Timestamps allocation
To: Michael Tuexen <tuexen@freebsd.org>
Cc: freebsd-net@freebsd.org
Received: from devgs@ukr.net by frv39.fwdcdn.com;
 Tue, 09 Jul 2019 15:58:54 +0300
In-Reply-To: <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org>
References: <1562579483.67527000.24rw4xi5@frv39.fwdcdn.com>
 <32FD061B-245C-41D2-81DE-1B4756A7173D@freebsd.org>
 <1562591379.369129000.gpmxvurq@frv39.fwdcdn.com>
 <DF65CA7F-B5FC-499D-B053-0531596D230C@freebsd.org>
 <1562599181.734953000.1l9a1d23@frv39.fwdcdn.com>
 <0C475A01-9BCD-4E4A-9731-09AB919CA9BE@freebsd.org>
X-Reply-Action: reply
Message-Id: <1562676414.933145000.z3zteyqp@frv39.fwdcdn.com>
X-Mailer: mail.ukr.net 5.0
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: binary
X-Rspamd-Queue-Id: 392918FBC6
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.97 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.97)[-0.967,0];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jul 2019 12:59:03 -0000

Hi Michael,

9 July 2019, 15:34:29, by "Michael Tuexen" <tuexen@freebsd.org>:

> 
> 
> > On 8. Jul 2019, at 17:22, Paul <devgs@ukr.net> wrote:
> > 
> > 
> > 
> > 8 July 2019, 17:12:21, by "Michael Tuexen" <tuexen@freebsd.org>:
> > 
> >>> On 8. Jul 2019, at 15:24, Paul <devgs@ukr.net> wrote:
> >>> 
> >>> Hi Michael,
> >>> 
> >>> 8 July 2019, 15:53:15, by "Michael Tuexen" <tuexen@freebsd.org>:
> >>> 
> >>>>> On 8. Jul 2019, at 12:37, Paul <devgs@ukr.net> wrote:
> >>>>> 
> >>>>> Hi team,
> >>>>> 
> >>>>> Recently we had an upgrade to 12 Stable. Immediately after, we have started 
> >>>>> seeing some strange connection establishment timeouts to some fixed number
> >>>>> of external (world) hosts. The issue was persistent and easy to reproduce.
> >>>>> Thanks to a patience and dedication of our system engineer we have tracked  
> >>>>> this issue down to a specific commit:
> >>>>> 
> >>>>> https://svnweb.freebsd.org/base?view=revision&revision=338053
> >>>>> 
> >>>>> This patch was also back-ported into 11 Stable:
> >>>>> 
> >>>>> https://svnweb.freebsd.org/base?view=revision&revision=348435
> >>>>> 
> >>>>> Among other things this patch changes the timestamp allocation strategy,
> >>>>> by introducing a deterministic randomness via a hash function that takes
> >>>>> into account a random key as well as source address, source port, dest
> >>>>> address and dest port. As the result, timestamp offsets of different
> >>>>> tuples (SA,SP,DA,DP) will be wildly different and will jump from small 
> >>>>> to large numbers and back, as long as something in the tuple changes.
> >>>> Hi Paul,
> >>>> 
> >>>> this is correct.
> >>>> 
> >>>> Please note that the same happens with the old method, if two hosts with
> >>>> different uptimes are bind a consumer grade NAT.
> >>> 
> >>> If NAT does not replace timestamps then yes, it should be the case.
> >>> 
> >>>>> 
> >>>>> After performing various tests of hosts that produce the above mentioned 
> >>>>> issue we came to conclusion that there are some interesting implementations 
> >>>>> that drop SYN packets with timestamps smaller  than the largest timestamp 
> >>>>> value from streams of all recent or current connections from a specific 
> >>>>> address. This looks as some kind of SYN flood protection.
> >>>> This also breaks multiple hosts with different uptimes behind a consumer
> >>>> level NAT talking to such a server.
> >>>>> 
> >>>>> To ensure that each external host is not going to see a wild jumps of 
> >>>>> timestamp values I propose a patch that removes ports from the equation
> >>>>> all together, when calculating the timestamp offset:
> >>>>> 
> >>>>> Index: sys/netinet/tcp_subr.c
> >>>>> ===================================================================
> >>>>> --- sys/netinet/tcp_subr.c	(revision 348435)
> >>>>> +++ sys/netinet/tcp_subr.c	(working copy)
> >>>>> @@ -2224,7 +2224,22 @@
> >>>>> uint32_t
> >>>>> tcp_new_ts_offset(struct in_conninfo *inc)
> >>>>> {
> >>>>> -	return (tcp_keyed_hash(inc, V_ts_offset_secret));
> >>>>> +        /* 
> >>>>> +         * Some implementations show a strange behaviour when a wildly random 
> >>>>> +         * timestamps allocated for different streams. It seems that only the
> >>>>> +         * SYN packets are affected. Observed implementations drop SYN packets
> >>>>> +         * with timestamps smaller than the largest timestamp value of all 
> >>>>> +         * recent or current connections from specific a address. To mitigate 
> >>>>> +         * this we are going to ensure that each host will always observe 
> >>>>> +         * timestamps as increasing no matter the stream: by dropping ports
> >>>>> +         * from the equation.
> >>>>> +         */ 
> >>>>> +        struct in_conninfo inc_copy = *inc;
> >>>>> +
> >>>>> +        inc_copy.inc_fport = 0;
> >>>>> +        inc_copy.inc_lport = 0;
> >>>>> +
> >>>>> +	return (tcp_keyed_hash(&inc_copy, V_ts_offset_secret));
> >>>>> }
> >>>>> 
> >>>>> /*
> >>>>> 
> >>>>> In any case, the solution of the uptime leak, implemented in rev338053 is 
> >>>>> not going to suffer, because a supposed attacker is currently able to use 
> >>>>> any fixed values of SP and DP, albeit not 0, anyway, to remove them out 
> >>>>> of the equation.
> >>>> Can you describe how a peer can compute the uptime from two observed timestamps?
> >>>> I don't see how you can do that...
> >>> 
> >>> Supposed attacker could run a script that continuously monitors timestamps,
> >>> for example via a periodic TCP connection from a fixed local port (eg 12345) 
> >>> and a fixed local address to the fixed victim's address and port (eg 80).
> >>> Whenever large discrepancy is observed, attacker can assume that reboot has 
> >>> happened (due to V_ts_offset_secret re-generation), hence the received 
> >>> timestamp is considered an approximate point of reboot from which the uptime
> >>> can be calculated, until the next reboot and so on.
> >> Ahh, I see. The patch we are talking about is not intended to protect against
> >> continuous monitoring, which is something you can always do. You could even
> >> watch for service availability and detect reboots. A change of the local key
> >> would also look similar to a reboot without a temporary loss of connectivity.
> >> 
> >> Thanks for the clarification.
> >>> 
> >>>>> 
> >>>>> There is the list of example hosts that we were able to reproduce the 
> >>>>> issue with:
> >>>>> 
> >>>>> curl -v http://88.99.60.171:80
> >>>>> curl -v http://163.172.71.252:80
> >>>>> curl -v http://5.9.242.150:80
> >>>>> curl -v https://185.134.205.105:443
> >>>>> curl -v https://136.243.1.231:443
> >>>>> curl -v https://144.76.196.4:443
> >>>>> curl -v http://94.127.191.194:80
> >>>>> 
> >>>>> To reproduce, call curl repeatedly with a same URL some number of times. 
> >>>>> You are going  to see some of the requests stuck in 
> >>>>> `*    Trying XXX.XXX.XXX.XXX...`
> >>>>> 
> >>>>> For some reason, the easiest way to reproduce the issue is with nc:
> >>>>> 
> >>>>> $ echo "foooooo" | nc -v 88.99.60.171 80
> >>>>> 
> >>>>> Only a few such calls are required until one of them is stuck on connect():
> >>>>> issuing SYN packets with an exponential backoff.
> >>>> Thanks for providing an end-point to test with. I'll take a look.
> >>>> Just to be clear: You are running a FreeBSD client against one of the above
> >>>> servers and experience the problem with the new timestamp computations.
> >>>> 
> >>>> You are not running arbitrary clients against a FreeBSD server...
> >>> 
> >>> We are talking about FreeBSD being the client. Peers that yield this unwanted
> >>> behaviour are unknown. Little bit of tinkering showed that some of them run 
> >>> Debian:
> >>> 
> >>> telnet 88.99.60.171 22
> >>> Trying 88.99.60.171...
> >>> Connected to 88.99.60.171.
> >>> Escape character is '^]'.
> >>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
> >> Also some are hosted by Hetzner, but not all. I'll will look into
> >> this tomorrow, since I'm on a deadline today (well it is 2am tomorrow
> >> morning, to be precise)...
> > 
> > Thanks a lot, I would appreciate that.
> Hi Paul,
> 
> I have looked into this.
> 
> * The FreeBSD behaviour is the one which is specified in the last bullet item
>   in https://tools.ietf.org/html/rfc7323#section-5.4
>   It is also the one, which is RECOMMENDED in
>   https://tools.ietf.org/html/rfc7323#section-7.1 
> 
> * My NAT box (a popular one in Germany) does NOT rewrite TCP timestamps.
> 
> This means that the host you are referring to have some sort of protection,
> which makes incorrect assumptions. It will also break multiple hosts behind
> a NAT.
> 
> I can run
> curl -v http://88.99.60.171:80
> in a loop without any problems from a FreeBSD head system. I tested 1000
> iterations or so. The TS.val is jumping up and down as expected.
> I'm wondering why you are observing errors in this case, too.
> 
> However, doing something like
> echo "foooooo" | nc -v 88.99.60.171 80
> triggers the problem.
> 
> So I think there is some functionality (in a middlebox or running on the host),
> which incorrectly assume monotonic timestamps between multiple TCP connections
> coming from the same IP address, but only in case of errors at the application layer.

Yeah, exactly, some hosts seem to enable this only in case of an error in HTTP
communication (some smart proxy?). However, there are some that behave this way
regardless of errors, for example these:

curl -v https://185.134.205.105:443
curl -v https://136.243.1.231:443

> 
> Do you have any insights whether the hosts you are listed share something in
> common. Some of them are hosted by Hetzner, but not all.

Nope. A whole set of endpoints that we have detected so far is pretty diverse,
containing a lot of different locations geographically, as well as different
hosters.

> 
> I think in general, it is the correct thing to include the port numbers in
> the offset computation. We might add a sysctl variable to control the inclusion.
> This would allow interworking with broken middleboxes.

Yeah, I completely agree that these rare cases should not dictate the implementation.
But an ability to enable a work-around via sysctl would be greatly appreciated.
Currently we are unable to roll-out the upgrade across all servers because of this
issue: even though it happens not so often, a lot of requests from our users 
get stuck or fail all together. For example, a host 185.134.205.105 is a kind of
social network that our proxy servers connect to so securely access to content,
such as images, on behalf of our users.

> 
> Please note, this does not fix the case of multiple clients behind a NAT.

Yeah, that's true. Fortunately we don't use NAT.

> 
> I'm also trying to figure out how and why Linux and Windows are handling this.

Thanks for bothering!

> 
> Best regards
> Michael
> 
> > 
> >> 
> >> Best regards
> >> Michael 
> >>> 
> >>> 
> >>>> 
> >>>> Best regards
> >>>> Michael
> >>>> 
> >>>> 
> >> 
> >> 
> 
>