From owner-freebsd-net@FreeBSD.ORG  Wed Feb  6 11:27:06 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D9C7E731;
 Wed,  6 Feb 2013 11:27:06 +0000 (UTC)
 (envelope-from rrs@lakerest.net)
Received: from lakerest.net (lakerest.net [70.155.160.98])
 by mx1.freebsd.org (Postfix) with ESMTP id 49EFD1E7;
 Wed,  6 Feb 2013 11:27:05 +0000 (UTC)
Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102])
 (authenticated bits=0)
 by lakerest.net (8.14.4/8.14.3) with ESMTP id r16BRLbK052791
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT);
 Wed, 6 Feb 2013 06:27:21 -0500 (EST) (envelope-from rrs@lakerest.net)
Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset=us-ascii
From: Randall Stewart <rrs@lakerest.net>
In-Reply-To: <50FF06AD.402@networx.ch>
Date: Wed, 6 Feb 2013 06:27:04 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <061B4EA5-6A93-48A0-A269-C2C3A3C7E77C@lakerest.net>
References: <201301221511.02496.jhb@freebsd.org> <50FEF81C.1070002@mu.org>
 <50FF06AD.402@networx.ch>
To: John Baldwin <jhb@freebsd.org>
X-Mailer: Apple Mail (2.1283)
Cc: Alfred Perlstein <bright@mu.org>, net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Feb 2013 11:27:06 -0000

John:

A burst at line rate will *often* cause drops. This is because
router queues are at a finite size. Also such a burst (especially
on a long delay bandwidth network) cause your RTT to increase even
if there is no drop which is going to hurt you as well.

A SHOULD in an RFC says you really really really really need to do it
unless there is some thing that makes you willing to override it. It is
slight wiggle room.

In this I agree with Andre, we should not be *not* doing it. Otherwise
folks will be turning this on and it is plain wrong. It may be fine
for your network but I would not want to see it in FreeBSD.

In my testing here at home I have put back into our stack max-burst. =
This
uses Mark Allman's version (not Kacheong Poon's) where you clamp the =
cwnd at
no more than 4 packets larger than your flight. All of my testing
high-bw-delay or lan has shown this to improve TCP performance. This
is because it helps you avoid bursting out so many packets that you =
overflow
a queue.

In your long-delay bw link if you do burst out too many (and you never
know how many that is since you can not predict how full all those
MPLS queues are or how big they are) you will really hurt yourself even =
worse.
Note that generally in Cisco routers the default queue size is somewhere =
between
100-300 packets depending on the router.

bottom line IMO this is a bad idea.

If you want to really improve that link, let me get with you off line =
and we can
see about getting you a couple of our boxes again :-D.

R
On Jan 22, 2013, at 4:37 PM, Andre Oppermann wrote:

> On 22.01.2013 21:35, Alfred Perlstein wrote:
>> On 1/22/13 12:11 PM, John Baldwin wrote:
>>> As I mentioned in an earlier thread, I recently had to debug an =
issue we were
>>> seeing across a link with a high bandwidth-delay product (both high =
bandwidth
>>> and high RTT).  Our specific use case was to use a TCP connection to =
reliably
>>> forward a latency-sensitive datagram stream across a WAN connection. =
 We would
>>> often see spikes in the latency of individual datagrams.  I =
eventually tracked
>>> this down to the connection entering slow start when it would =
transmit data
>>> after being idle.  The data stream was quite bursty and would often =
attempt to
>>> transmit a burst of data after being idle for far longer than a =
retransmit
>>> timeout.
>>>=20
>>> In 7.x we had worked around this in the past by disabling RFC 3390 =
and jacking
>>> the slow start window size up via a sysctl.  On 8.x this no longer =
worked.
>>> The solution I came up with was to add a new socket option to =
disable idle
>>> handling completely.  That is, when an idle connection restarts with =
this new
>>> option enabled, it keeps its current congestion window and doesn't =
enter slow
>>> start.
>>>=20
>>> There are only a few cases where such an option is useful, but if =
anyone else
>>> thinks this might be useful I'd be happy to add the option to =
FreeBSD.
>>=20
>> This looks good, but it almost sounds like a bug for TCP to be doing =
this anyhow.
>=20
> It's not a bug.  It's by design.  It's required by the RFC.
>=20
>> Why would one want this behavior?
>=20
> Network conditions change all the time.  Traffic and congestion comes =
and goes.
> Connections can go idle for milliseconds to minutes to hours.  =
Whenever "enough"
> time has passed network capacity probing has to start anew.
>=20
>> Wouldn't it make sense to keep the window large until there was a =
problem rather than
>> unconditionally chop it down?  I almost think TCP is afraid that you =
might wind up swapping out a
>> 10gig interface for a modem?  I'm just not getting it.  (probably =
simple oversight on my part).
>=20
> The very real fear is congestion meltdown.  That is the reason we =
ended up with
> TCP's AIMD mechanism in the first place.  If everybody were to blast =
into the
> network anyone will suffer.  The bufferbloat issue identified recently =
makes things
> even worse.
>=20
>> What do you think about also making this a sysctl for global on/off =
by default?
>=20
> Please don't.  The correct fix is either a) to use the initial window =
as the restart
> window (up to 10 MSS nowadays); b) to use a decay mechanism based on =
the time since
> the last network condition probe.  Even the latter must decay to =
initCWND within at
> most 1MSL.
>=20
> --=20
> Andre
>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>=20

------------------------------
Randall Stewart
803-317-4952 (cell)