From owner-freebsd-net@FreeBSD.ORG Tue Aug 12 12:03:17 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 65D0D872 for ; Tue, 12 Aug 2014 12:03:17 +0000 (UTC) Received: from mail-vc0-x231.google.com (mail-vc0-x231.google.com [IPv6:2607:f8b0:400c:c03::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2421F24CA for ; Tue, 12 Aug 2014 12:03:17 +0000 (UTC) Received: by mail-vc0-f177.google.com with SMTP id hy4so12878017vcb.8 for ; Tue, 12 Aug 2014 05:03:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=mnfVcARQKO5dDXoq1LB85KCrpn47TDcacZ8meM4xjMA=; b=y0Gr6AYSXPRgIPGJ6E9karO6Yu1YgBlyvnNECKU1KH1arAxGcnwgIwH4H6ngYA8cNf eaPAv3tEaE3AbRPVw/lvyOImoPBT+VlNCFoEQBOXjvCRYcm/D4Htq7QV+42UVuvebXpO m//17HnEiC48RSI9a7rVu9rUbk5zIk4ucy54QyflhcpVIeQ9USZzk/vq9sFtxmhP3d9S WDjaRpgXlsrCUh0JXFYmIgsnaPCORXn0otzxECQ8CUfYiBkaQj8aXzSlGNPPFegZ6JzN wJu0M5BRiqNtMb7xugTxTCILqtzKRtoZRvcb/FJLtmZZHe2bhqj/iNxLSmE3q/ovDtQA 3qSQ== MIME-Version: 1.0 X-Received: by 10.220.59.65 with SMTP id k1mr3207393vch.22.1407844996153; Tue, 12 Aug 2014 05:03:16 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.220.186.193 with HTTP; Tue, 12 Aug 2014 05:03:15 -0700 (PDT) In-Reply-To: <53E9FF32.3010802@cloudius-systems.com> References: <53E8B424.2000904@cloudius-systems.com> <20140811170606.GV83475@funkthat.com> <53E9FF32.3010802@cloudius-systems.com> Date: Tue, 12 Aug 2014 05:03:15 -0700 X-Google-Sender-Auth: YhygqUDc36zT0rEaNk2Ph4YUg0c Message-ID: Subject: Re: TCP Rx window auto sizing relies on TCP timestamp option? From: Adrian Chadd To: Vlad Zolotarov Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD Net , Osv Dev X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Aug 2014 12:03:17 -0000 The TL;DR is - yes, I bet it'd be nice to have. :) -a On 12 August 2014 04:49, Vlad Zolotarov wrote: > > On Aug 11, 2014 8:06 PM, "John-Mark Gurney" > wrote: >> >> Vlad Zolotarov wrote this message on Mon, Aug 11, 2014 at 15:16 +0300: >> > Hi, I have the most strange question about the TCP Rx window auto sizing >> > implementation in a FreeBSD networking stack. >> > When I looked at the FreeBSD code (hash >> > 9abce0e567c9a5a0520cdd94d5c633c7baf9a184) I noticed that >> > the mentioned above feature will not be "enabled" if there isn't a TCP >> > timestamp option present in the current TCP session: >> > >> > See sys/netinet/tcp_input.c: line 1813 in tcp_do_segment() function: >> > >> > if (V_tcp_do_autorcvbuf && >> > *to.to_tsecr* && <-------- this is what I'm >> > talking about >> > (so->so_rcv.sb_flags & SB_AUTOSIZE)) >> > >> > So, if i read the code correctly, if there isn't a TS option (negotiated >> > and thus present in every received packet) the receive socket buffer >> > won't grow thus preventing the growth of the Rx window. >> > If that's the case this is very strange since TS option is not promised >> > and even more - in many cases it won't be present. >> > For example in Linux this feature is disabled by default (controlled by >> > /proc/sys/net/ipv4/tcp_timestamps). >> > This is how I actually noticed the problem the first place: I ran iperf >> > test where Linux was an initiator and a transmitter (iperf -c) FreeBSD >> > box was a receiver (iperf -s) and I noticed that the Rx window wasn't >> > opening up because Linux box hasn't negotiated the TS option in the SYN. >> > As a result, the throughput numbers were significantly lower compared to >> > Linux-to-Linux setup (Linux uses a Dynamic Right-Sizing (DRS) algorithm >> > http://public.lanl.gov/radiant/pubs.html#DRS, which doesn't rely on TS). >> > >> > Could anybody comment on this, pls.? >> > Did I miss anything? >> > Is it true that FreeBSD assumes that TS option is always present and if >> > not how can I cause an Rx Window to open up when TS option hasn't been >> > negotiated? >> >> This means the receive buffer won't grow beyond the default of 64k... >> But, as the comment says: >> * On the receive side the socket buffer memory is only >> rarely >> * used to any significant extent. This allows us to be >> much >> >> The receive buffer will only get used if the application takes too long >> to read it's buffer, or it isn't currently waiting... If that's the >> case, then the application should be fixed to be able to process the >> data as quickly as it comes in... > > U r right about the Rx buffer and as a result the Rx window will not grow > beyond this value too. > > See the following lines: > > tcp_output.c: tcp_output(): > > line 509: > > recwin = sbspace(&so->so_rcv); > > > line 1034: > > /* > * According to RFC1323 the window field in a SYN (i.e., a > * or ) segment itself is never scaled. The > * case is handled in syncache. > */ > if (flags & TH_SYN) > th->th_win = htons((u_short) > (min(sbspace(&so->so_rcv), TCP_MAXWIN))); > else > th->th_win = htons((u_short)(recwin >> tp->rcv_scale)); > > > As a result the Tx window of a transmitter will not grow beyond 64K as well > and this is a single full LSO/LRO frame. > So this will limit a transmitter by a single LSO frame (64K) frame per RTT > since the receiver will only "see" the new bytes only after they are > delivered by a HW and this will be after all 64KB (full LRO aggregation) are > received and only then it will send an ACK. > > Now let's consider u have a 0.2ms RTT like I have on my setup with 40Gbps > ConnectX 3 NICs connected back to back. > So, in this case the best throughput u'll ever get with the 64K window will > be 8*64K/0.2ms ~ 2.5Gbps which is 1/16 of a line rate and u need at least > 64K*16 ~ 1MB window to reach the line rate. And the higher RTT the larger > Window we'll need. And this is in case the application frees the socket > buffer immediately once it arrives which may never be the case of course. > > I suppose use cases like above were exactly the motivation for Window > Scaling option in RFC 1323. > > >> >> So, I don't see much of an issue w/ the code you pointed out, yes, >> the receive buffer won't grow, > >> but there are options that you can set >> (sysctl net.inet.tcp.recvspace) and SO_RCVBUF in the application that >> will address it otherwise... > > Exactly! If there is no TS - it won't and FreeBSD will not be able to > utilize the network link. > Frankly, I don't understand your advice - u suggest for each and every > application to go and manually configure a receive socket buffer size? Or > increase the initial socket buffer globally, which is even worse?! And which > value should we choose? As u may see above the proper value depends on the > RTT and RTT may change while application runs due to routing change. I doubt > your suggestion is feasible. > > So, my first question stands - doesn't FreeBSD community think that it would > be beneficial for FreeBSD to use a DRS (or similar?) algorithm when there > are no TS negotiated? > > thanks, > vlad > > >> >> Obviously setting the default too large will just waste memory... >> >> -- >> John-Mark Gurney Voice: +1 415 225 5579 >> >> "All that I will do, has been done, All that I have, has not." > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"