From owner-freebsd-net@FreeBSD.ORG Wed Jan 30 17:26:21 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 99E44417 for ; Wed, 30 Jan 2013 17:26:21 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 02AAAA31 for ; Wed, 30 Jan 2013 17:26:20 +0000 (UTC) Received: (qmail 90553 invoked from network); 30 Jan 2013 18:46:19 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 30 Jan 2013 18:46:19 -0000 Message-ID: <510957B9.8070203@freebsd.org> Date: Wed, 30 Jan 2013 18:26:17 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: John Baldwin Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option References: <201301221511.02496.jhb@freebsd.org> <201301291350.39931.jhb@freebsd.org> <5108562A.1040603@freebsd.org> <201301301158.33838.jhb@freebsd.org> In-Reply-To: <201301301158.33838.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Sepherosa Ziehau , freebsd-net@freebsd.org, Bjoern Zeeb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jan 2013 17:26:21 -0000 On 30.01.2013 17:58, John Baldwin wrote: > On Tuesday, January 29, 2013 6:07:22 pm Andre Oppermann wrote: >> On 29.01.2013 19:50, John Baldwin wrote: >>> On Thursday, January 24, 2013 11:14:40 am John Baldwin wrote: >>>>>> Agree, per-socket option could be useful than global sysctls under >>>>>> certain situation. However, in addition to the per-socket option, >>>>>> could global sysctl nodes to disable idle_restart/idle_cwv help too? >>>>> >>>>> No. This is far too dangerous once it makes it into some tuning guide. >>>>> The threat of congestion breakdown is real. The Internet, or any packet >>>>> network, can only survive in the long term if almost all follow the rules >>>>> and self-constrain to remain fair to the others. What would happen if >>>>> nobody would respect the traffic lights anymore? >>>> >>>> The problem with this argument is Linux has already had this as a tunable >>>> option for years and the Internet hasn't melted as a result. >>>> >>>>> Since this seems to be a burning issue I'll come up with a patch in the >>>>> next days to add a decaying restartCWND that'll be fair and allow a very >>>>> quick ramp up if no loss occurs. >>>> >>>> I think this could be useful. OTOH, I still think the TCP_IGNOREIDLE option >>>> is useful both with and without a decaying restartCWND? >>> >>> *ping* >>> >>> Andre, do you object to adding the new socket option? >> >> Yes, unfortunately I do object. This option, combined with the inflated >> CWND at the end of a burst, effectively removes much, if not all, of the >> congestion control mechanisms originally put in place to allow multiple >> [TCP] streams co-exist on the same pipe. Not having any decay or timeout >> makes it even worse by doing this burst after an arbitrary amount of time >> when network conditions and the congestion situation have certainly changed. > > You have completely ignored the fact that Linux has had this as a global > option for years and the Internet has not melted. Sure. A friend of mine does free climbing and he hasn't crashed yet. He also runs all filesystems async with disk write cache enabled, no backup and hasn't lost a file yet. ;-) > A socket option is far more > fine-grained than their tunable (and requires code changes, not something a > random sysadmin can just toggle as "tuning"). Agreed that a socket option is much more difficult to use. >> The primary principle of TCP is be cooperative with competing streams and >> fairly share bandwidth on a given link. Whenever the ACK clock came to a >> halt for some time we must re-probe (slowstart from a restartCWND) the link >> to compensate for our lack of knowledge of the current link and congestion >> situation. Doing that with a decay function and floor equaling the IW (10 >> segments nowadays) gives a rapid ramp up especially on LAN RTTs while avoiding >> a blind burst and subsequent loss cycle. > > I understand all that, but it isn't applicable to my use case. I'm not sharing > the bandwidth with anyone but other connections of my own (and they are all > lower priority than this one). Also, I have idle periods of hundreds of > milliseconds (large than an RTT on this cross-continental link that also has > high bandwidth), so it seems that even a decayed restartCWND will be useless to > me as it will have decayed down to nothing before I finally restart after long > idle periods. OK. >> If you absolutely know that you're the only one on that network and you want >> pure wirespeed then a TCP cc_null module doing away with all congestion control >> may be the right answer. The infrastructure is in place and it can be selected >> per socket. Plus it can be loaded as a module and thus doesn't have to be part >> of the base system. > > No, I do not think that doing away with all congestion control will work for > my case. Even though we have a dedicated line, etc. that doesn't mean > congestion is impossible and that I don't want the "normal" feedback to apply > during the non-restart cases. BTW, I looked at using alternate congestion > control algorithms (cc_cubic and some of the others) first before resorting to > adding this option and they either did not fix the issue or were buggy. You can simply create your own congestion control algorithm with only the restart window changed. See (pseudo) code below. BTW, I just noticed that the other cc algos don't do not reset the idle window. -- Andre /* boilerplate from netinet/cc/cc_newreno.c here. */ struct cc_algo jhb_cc_algo = { .name = "jhb_full_restartCWND", .ack_received = newreno_ack_received, .after_idle = jhb_after_idle, .cong_signal = newreno_cong_signal, .post_recovery = newreno_post_recovery, }; static void jhb_after_idle(struct cc_var *ccv) { return; }