Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Jan 2013 00:07:22 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Sepherosa Ziehau <sepherosa@gmail.com>, freebsd-net@freebsd.org, Bjoern Zeeb <bz@freebsd.org>
Subject:   Re: [PATCH] Add a new TCP_IGNOREIDLE socket option
Message-ID:  <5108562A.1040603@freebsd.org>
In-Reply-To: <201301291350.39931.jhb@freebsd.org>
References:  <201301221511.02496.jhb@freebsd.org> <5100EAD3.2090006@networx.ch> <201301241114.40734.jhb@freebsd.org> <201301291350.39931.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29.01.2013 19:50, John Baldwin wrote:
> On Thursday, January 24, 2013 11:14:40 am John Baldwin wrote:
>>>> Agree, per-socket option could be useful than global sysctls under
>>>> certain situation.  However, in addition to the per-socket option,
>>>> could global sysctl nodes to disable idle_restart/idle_cwv help too?
>>>
>>> No.  This is far too dangerous once it makes it into some tuning guide.
>>> The threat of congestion breakdown is real.  The Internet, or any packet
>>> network, can only survive in the long term if almost all follow the rules
>>> and self-constrain to remain fair to the others.  What would happen if
>>> nobody would respect the traffic lights anymore?
>>
>> The problem with this argument is Linux has already had this as a tunable
>> option for years and the Internet hasn't melted as a result.
>>
>>> Since this seems to be a burning issue I'll come up with a patch in the
>>> next days to add a decaying restartCWND that'll be fair and allow a very
>>> quick ramp up if no loss occurs.
>>
>> I think this could be useful.  OTOH, I still think the TCP_IGNOREIDLE option
>> is useful both with and without a decaying restartCWND?
>
> *ping*
>
> Andre, do you object to adding the new socket option?

Yes, unfortunately I do object.  This option, combined with the inflated
CWND at the end of a burst, effectively removes much, if not all, of the
congestion control mechanisms originally put in place to allow multiple
[TCP] streams co-exist on the same pipe.  Not having any decay or timeout
makes it even worse by doing this burst after an arbitrary amount of time
when network conditions and the congestion situation have certainly changed.

The primary principle of TCP is be cooperative with competing streams and
fairly share bandwidth on a given link.  Whenever the ACK clock came to a
halt for some time we must re-probe (slowstart from a restartCWND) the link
to compensate for our lack of knowledge of the current link and congestion
situation.  Doing that with a decay function and floor equaling the IW (10
segments nowadays) gives a rapid ramp up especially on LAN RTTs while avoiding
a blind burst and subsequent loss cycle.

If you absolutely know that you're the only one on that network and you want
pure wirespeed then a TCP cc_null module doing away with all congestion control
may be the right answer.  The infrastructure is in place and it can be selected
per socket.  Plus it can be loaded as a module and thus doesn't have to be part
of the base system.

I'm currently re-emerging finishing up from the startup and auto-scaling rabbit-
hole and will post patches for review shortly.

After that I'm looking after the restartCWND issue.  A first quick patch
(untested) to update the restartCWND to the IW is below.

--
Andre

$ svn diff netinet/cc/cc_newreno.c
Index: netinet/cc/cc_newreno.c
===================================================================
--- netinet/cc/cc_newreno.c     (revision 246082)
+++ netinet/cc/cc_newreno.c     (working copy)
@@ -166,12 +166,21 @@
          *
          * See RFC5681 Section 4.1. "Restarting Idle Connections".
          */
-       if (V_tcp_do_rfc3390)
+       if (V_tcp_do_initcwnd10)
+               rw = min(10 * CCV(ccv, t_maxseg),
+                   max(2 * CCV(ccv, t_maxseg), 14600));
+       else if (V_tcp_do_rfc3390)
                 rw = min(4 * CCV(ccv, t_maxseg),
                     max(2 * CCV(ccv, t_maxseg), 4380));
-       else
-               rw = CCV(ccv, t_maxseg) * 2;
-
+       else {
+               /* Per RFC5681 Section 3.1 */
+               if (CCV(ccv, t_maxseg) > 2190)
+                       rw = 2 * CCV(ccv, t_maxseg);
+               else if (CCV(ccv, t_maxseg) > 1095)
+                       rw = 3 * CCV(ccv, t_maxseg);
+               else
+                       rw = 4 * CCV(ccv, t_maxseg);
+       }
         CCV(ccv, snd_cwnd) = min(rw, CCV(ccv, snd_cwnd));
  }





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5108562A.1040603>