From owner-freebsd-net@FreeBSD.ORG Thu Jan 24 02:31:31 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E88C5952; Thu, 24 Jan 2013 02:31:31 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: from mail-vc0-f172.google.com (mail-vc0-f172.google.com [209.85.220.172]) by mx1.freebsd.org (Postfix) with ESMTP id 6E48AEF8; Thu, 24 Jan 2013 02:31:30 +0000 (UTC) Received: by mail-vc0-f172.google.com with SMTP id l6so7172023vcl.31 for ; Wed, 23 Jan 2013 18:31:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=cwZCwzUPr0Re0TVUaoUG6IsFbwqsG/mIbyx0HSxP0Hs=; b=OZclqy0taVfmrDJfnlDaKDeBbYhQgrDCZnj1a8zxfVPAtC3KL1+wT3v3aTh+76cZI9 4OO+5IgwyGhWKQZMXS8s6FIMfR8U+DAHg5ik4/JCK9aXXs4VnecoLc3b1ZtoZASRNbbW dGdvu4iMqW9MgrNkMLVjnb/V0oqm8IquyfLqKXYddrvajdHPhfbyxiPwD9rkDCC4v4XG 6D1KL544OD84CztFlrqNEtm/rcg0kBSnLCbdVGy78gdvB4qAY9Wfn0qNENKjUlW3pFjA Gotko7Q7SGpn17Ou0jDi2hEYRNSsBBN/54I1PiwuSqdNO98CTlMs1k9DRrZovBpIiQc5 ISRA== MIME-Version: 1.0 X-Received: by 10.52.90.18 with SMTP id bs18mr290391vdb.89.1358994690416; Wed, 23 Jan 2013 18:31:30 -0800 (PST) Received: by 10.58.213.34 with HTTP; Wed, 23 Jan 2013 18:31:30 -0800 (PST) In-Reply-To: <201301231115.06393.jhb@freebsd.org> References: <201301221511.02496.jhb@freebsd.org> <201301231115.06393.jhb@freebsd.org> Date: Thu, 24 Jan 2013 10:31:30 +0800 Message-ID: Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option From: Sepherosa Ziehau To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-net@freebsd.org" , Bjoern Zeeb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2013 02:31:32 -0000 On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin wrote: > On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote: >> On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin wrote: >> > As I mentioned in an earlier thread, I recently had to debug an issue we were >> > seeing across a link with a high bandwidth-delay product (both high bandwidth >> > and high RTT). Our specific use case was to use a TCP connection to reliably >> > forward a latency-sensitive datagram stream across a WAN connection. We would >> > often see spikes in the latency of individual datagrams. I eventually tracked >> > this down to the connection entering slow start when it would transmit data >> > after being idle. The data stream was quite bursty and would often attempt to >> > transmit a burst of data after being idle for far longer than a retransmit >> > timeout. >> > >> > In 7.x we had worked around this in the past by disabling RFC 3390 and jacking >> > the slow start window size up via a sysctl. On 8.x this no longer worked. >> > The solution I came up with was to add a new socket option to disable idle >> > handling completely. That is, when an idle connection restarts with this new >> > option enabled, it keeps its current congestion window and doesn't enter slow >> > start. >> > >> > There are only a few cases where such an option is useful, but if anyone else >> > thinks this might be useful I'd be happy to add the option to FreeBSD. >> >> I think what you need is the RFC2861, however, you probably should >> ignore the "application-limited period" part of RFC2861. > > Hummm. It appears btw, that Linux uses RFC 2861, but has a global knob to > disable it due to applictions having problems. When it is disabled, > it doesn't decay the congestion window at all during idle handling. That is, > it appears to act the same as if TCP_IGNOREIDLE were enabled. > > From http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html: > > tcp_slow_start_after_idle (Boolean; default: enabled; since Linux 2.6.18) > If enabled, provide RFC 2861 behavior and time out the congestion > window after an idle period. An idle period is defined as the current > RTO (retransmission timeout). If disabled, the congestion window will > not be timed out after an idle period. > > Also, in this thread on tcp-m it appears no one on that list realizes that > there are any implementations which follow the "SHOULD" in RFC 2581 for idle > handling (which is what we do currently): Nah, I don't think the idle detection in FreeBSD follows the RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD"). IMHO, that's probably why the author in the following email requestioned about the implementation of "SHOULD" in RFC2581/RFC5681. > > http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html > > So if we were to implement RFC 2861, the new socket option would be equivalent > to setting Linux's 'tcp_slow_start_after_idle' to false, but on a per-socket > basis rather than globally. Agree, per-socket option could be useful than global sysctls under certain situation. However, in addition to the per-socket option, could global sysctl nodes to disable idle_restart/idle_cwv help too? Best Regards, sephe -- Tomorrow Will Never Die