From owner-freebsd-net@FreeBSD.ORG Thu Jan 24 21:10:58 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9B1C0F34; Thu, 24 Jan 2013 21:10:58 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 7C1E7743; Thu, 24 Jan 2013 21:10:58 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (207.110.29.135.ptr.us.xo.net [207.110.29.135]) by elvis.mu.org (Postfix) with ESMTPSA id EAF521A3C77; Thu, 24 Jan 2013 13:10:51 -0800 (PST) Message-ID: <5101A35B.2060104@mu.org> Date: Thu, 24 Jan 2013 16:10:51 -0500 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: John Baldwin Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option References: <201301221511.02496.jhb@freebsd.org> <5100EAD3.2090006@networx.ch> <201301241114.40734.jhb@freebsd.org> In-Reply-To: <201301241114.40734.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Sepherosa Ziehau , freebsd-net@freebsd.org, Bjoern Zeeb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2013 21:10:58 -0000 On 1/24/13 11:14 AM, John Baldwin wrote: > On Thursday, January 24, 2013 3:03:31 am Andre Oppermann wrote: >> On 24.01.2013 03:31, Sepherosa Ziehau wrote: >>> On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin wrote: >>>> On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote: >>>>> On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin wrote: >>>>>> As I mentioned in an earlier thread, I recently had to debug an issue we were >>>>>> seeing across a link with a high bandwidth-delay product (both high bandwidth >>>>>> and high RTT). Our specific use case was to use a TCP connection to reliably >>>>>> forward a latency-sensitive datagram stream across a WAN connection. We would >>>>>> often see spikes in the latency of individual datagrams. I eventually tracked >>>>>> this down to the connection entering slow start when it would transmit data >>>>>> after being idle. The data stream was quite bursty and would often attempt to >>>>>> transmit a burst of data after being idle for far longer than a retransmit >>>>>> timeout. >>>>>> >>>>>> In 7.x we had worked around this in the past by disabling RFC 3390 and jacking >>>>>> the slow start window size up via a sysctl. On 8.x this no longer worked. >>>>>> The solution I came up with was to add a new socket option to disable idle >>>>>> handling completely. That is, when an idle connection restarts with this new >>>>>> option enabled, it keeps its current congestion window and doesn't enter slow >>>>>> start. >>>>>> >>>>>> There are only a few cases where such an option is useful, but if anyone else >>>>>> thinks this might be useful I'd be happy to add the option to FreeBSD. >>>>> I think what you need is the RFC2861, however, you probably should >>>>> ignore the "application-limited period" part of RFC2861. >>>> Hummm. It appears btw, that Linux uses RFC 2861, but has a global knob to >>>> disable it due to applictions having problems. When it is disabled, >>>> it doesn't decay the congestion window at all during idle handling. That is, >>>> it appears to act the same as if TCP_IGNOREIDLE were enabled. >>>> >>>> From http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html: >>>> >>>> tcp_slow_start_after_idle (Boolean; default: enabled; since Linux 2.6.18) >>>> If enabled, provide RFC 2861 behavior and time out the congestion >>>> window after an idle period. An idle period is defined as the current >>>> RTO (retransmission timeout). If disabled, the congestion window will >>>> not be timed out after an idle period. >>>> >>>> Also, in this thread on tcp-m it appears no one on that list realizes that >>>> there are any implementations which follow the "SHOULD" in RFC 2581 for idle >>>> handling (which is what we do currently): >>> Nah, I don't think the idle detection in FreeBSD follows the >>> RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD"). IMHO, that's >>> probably why the author in the following email requestioned about the >>> implementation of "SHOULD" in RFC2581/RFC5681. >>> >>>> http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html >>>> >>>> So if we were to implement RFC 2861, the new socket option would be equivalent >>>> to setting Linux's 'tcp_slow_start_after_idle' to false, but on a per-socket >>>> basis rather than globally. >>> Agree, per-socket option could be useful than global sysctls under >>> certain situation. However, in addition to the per-socket option, >>> could global sysctl nodes to disable idle_restart/idle_cwv help too? >> No. This is far too dangerous once it makes it into some tuning guide. >> The threat of congestion breakdown is real. The Internet, or any packet >> network, can only survive in the long term if almost all follow the rules >> and self-constrain to remain fair to the others. What would happen if >> nobody would respect the traffic lights anymore? > The problem with this argument is Linux has already had this as a tunable > option for years and the Internet hasn't melted as a result. > >> Besides that bursting into unknown network conditions is very likely to >> result in burst losses as well. TCP isn't good at recovering from it. >> In the end you most likely come out ahead if you decay the restartCWND. >> >> We have two cases primarily: a) long distance, medium to high RTT, and >> wildly varying bandwidth (a.k.a. the Internet); b) short distance, low >> RTT and mostly plenty of bandwidth (a.k.a. Datacenter). The former >> absolutely definately requires a decayed restartCWND. The latter less >> so but even there bursting at 10Gig TSO assisted wirespeed isn't going >> to end too happy more often than not. > You forgot my case: c) dedicated long distance links with high bandwidth. > >> Since this seems to be a burning issue I'll come up with a patch in the >> next days to add a decaying restartCWND that'll be fair and allow a very >> quick ramp up if no loss occurs. > I think this could be useful. OTOH, I still think the TCP_IGNOREIDLE option > is useful both with and without a decaying restartCWND? > Linux seems to be doing just fine with it for what seems to be a long while. Can we get this committed? -Alfred