From owner-freebsd-net@FreeBSD.ORG Tue Jan 29 23:07:33 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 526AA9BC for ; Tue, 29 Jan 2013 23:07:33 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id C6DDAE27 for ; Tue, 29 Jan 2013 23:07:32 +0000 (UTC) Received: (qmail 85357 invoked from network); 30 Jan 2013 00:27:32 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 30 Jan 2013 00:27:32 -0000 Message-ID: <5108562A.1040603@freebsd.org> Date: Wed, 30 Jan 2013 00:07:22 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: John Baldwin Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option References: <201301221511.02496.jhb@freebsd.org> <5100EAD3.2090006@networx.ch> <201301241114.40734.jhb@freebsd.org> <201301291350.39931.jhb@freebsd.org> In-Reply-To: <201301291350.39931.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Sepherosa Ziehau , freebsd-net@freebsd.org, Bjoern Zeeb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Jan 2013 23:07:33 -0000 On 29.01.2013 19:50, John Baldwin wrote: > On Thursday, January 24, 2013 11:14:40 am John Baldwin wrote: >>>> Agree, per-socket option could be useful than global sysctls under >>>> certain situation. However, in addition to the per-socket option, >>>> could global sysctl nodes to disable idle_restart/idle_cwv help too? >>> >>> No. This is far too dangerous once it makes it into some tuning guide. >>> The threat of congestion breakdown is real. The Internet, or any packet >>> network, can only survive in the long term if almost all follow the rules >>> and self-constrain to remain fair to the others. What would happen if >>> nobody would respect the traffic lights anymore? >> >> The problem with this argument is Linux has already had this as a tunable >> option for years and the Internet hasn't melted as a result. >> >>> Since this seems to be a burning issue I'll come up with a patch in the >>> next days to add a decaying restartCWND that'll be fair and allow a very >>> quick ramp up if no loss occurs. >> >> I think this could be useful. OTOH, I still think the TCP_IGNOREIDLE option >> is useful both with and without a decaying restartCWND? > > *ping* > > Andre, do you object to adding the new socket option? Yes, unfortunately I do object. This option, combined with the inflated CWND at the end of a burst, effectively removes much, if not all, of the congestion control mechanisms originally put in place to allow multiple [TCP] streams co-exist on the same pipe. Not having any decay or timeout makes it even worse by doing this burst after an arbitrary amount of time when network conditions and the congestion situation have certainly changed. The primary principle of TCP is be cooperative with competing streams and fairly share bandwidth on a given link. Whenever the ACK clock came to a halt for some time we must re-probe (slowstart from a restartCWND) the link to compensate for our lack of knowledge of the current link and congestion situation. Doing that with a decay function and floor equaling the IW (10 segments nowadays) gives a rapid ramp up especially on LAN RTTs while avoiding a blind burst and subsequent loss cycle. If you absolutely know that you're the only one on that network and you want pure wirespeed then a TCP cc_null module doing away with all congestion control may be the right answer. The infrastructure is in place and it can be selected per socket. Plus it can be loaded as a module and thus doesn't have to be part of the base system. I'm currently re-emerging finishing up from the startup and auto-scaling rabbit- hole and will post patches for review shortly. After that I'm looking after the restartCWND issue. A first quick patch (untested) to update the restartCWND to the IW is below. -- Andre $ svn diff netinet/cc/cc_newreno.c Index: netinet/cc/cc_newreno.c =================================================================== --- netinet/cc/cc_newreno.c (revision 246082) +++ netinet/cc/cc_newreno.c (working copy) @@ -166,12 +166,21 @@ * * See RFC5681 Section 4.1. "Restarting Idle Connections". */ - if (V_tcp_do_rfc3390) + if (V_tcp_do_initcwnd10) + rw = min(10 * CCV(ccv, t_maxseg), + max(2 * CCV(ccv, t_maxseg), 14600)); + else if (V_tcp_do_rfc3390) rw = min(4 * CCV(ccv, t_maxseg), max(2 * CCV(ccv, t_maxseg), 4380)); - else - rw = CCV(ccv, t_maxseg) * 2; - + else { + /* Per RFC5681 Section 3.1 */ + if (CCV(ccv, t_maxseg) > 2190) + rw = 2 * CCV(ccv, t_maxseg); + else if (CCV(ccv, t_maxseg) > 1095) + rw = 3 * CCV(ccv, t_maxseg); + else + rw = 4 * CCV(ccv, t_maxseg); + } CCV(ccv, snd_cwnd) = min(rw, CCV(ccv, snd_cwnd)); }