From owner-freebsd-net@FreeBSD.ORG Wed Jan 23 06:33:34 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7FFB0425; Wed, 23 Jan 2013 06:33:34 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: from mail-vb0-f45.google.com (mail-vb0-f45.google.com [209.85.212.45]) by mx1.freebsd.org (Postfix) with ESMTP id 2DFA6C2A; Wed, 23 Jan 2013 06:33:33 +0000 (UTC) Received: by mail-vb0-f45.google.com with SMTP id p1so4118483vbi.4 for ; Tue, 22 Jan 2013 22:33:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=kwr6/PGApsHCXXmRDWQcA3vxJbk95jK8KHGP7HZRH68=; b=Nw16kpaqpL6PzKYoIndvwWJS9aNm/WGnKbYT0x35BENkaJk+cGSj5NMLywh7a6s4jE rLrFfg6lpIPVPUZHXyryLVgwy0Z5+SoeAfGOv0XLH/DVg7lJF53WlzyKXlVfquGRQIq/ 4k44oYiKzqdJXQLwEquPBSdO1fmCjfNOiUaPXex66aG0MVUM1X2FMnXswlYkrXIZtO2m by5qrMub/QKYpCclk+WyXJx7i/tsU6i5CSmTDwpA95uZR6Dz6LMOQlf6ekqghhJnP8Ys T73feuUn8ilenpGDP1ZWoZTlOLJrgA7bomNoy+itSkUtcnTllmsFI86qAMZGXRJJBB5g C2Ug== MIME-Version: 1.0 X-Received: by 10.220.230.199 with SMTP id jn7mr252449vcb.4.1358922807609; Tue, 22 Jan 2013 22:33:27 -0800 (PST) Received: by 10.58.213.34 with HTTP; Tue, 22 Jan 2013 22:33:27 -0800 (PST) In-Reply-To: <201301221511.02496.jhb@freebsd.org> References: <201301221511.02496.jhb@freebsd.org> Date: Wed, 23 Jan 2013 14:33:27 +0800 Message-ID: Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option From: Sepherosa Ziehau To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jan 2013 06:33:34 -0000 On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin wrote: > As I mentioned in an earlier thread, I recently had to debug an issue we were > seeing across a link with a high bandwidth-delay product (both high bandwidth > and high RTT). Our specific use case was to use a TCP connection to reliably > forward a latency-sensitive datagram stream across a WAN connection. We would > often see spikes in the latency of individual datagrams. I eventually tracked > this down to the connection entering slow start when it would transmit data > after being idle. The data stream was quite bursty and would often attempt to > transmit a burst of data after being idle for far longer than a retransmit > timeout. > > In 7.x we had worked around this in the past by disabling RFC 3390 and jacking > the slow start window size up via a sysctl. On 8.x this no longer worked. > The solution I came up with was to add a new socket option to disable idle > handling completely. That is, when an idle connection restarts with this new > option enabled, it keeps its current congestion window and doesn't enter slow > start. > > There are only a few cases where such an option is useful, but if anyone else > thinks this might be useful I'd be happy to add the option to FreeBSD. I think what you need is the RFC2861, however, you probably should ignore the "application-limited period" part of RFC2861. Best Regards, sephe > > Index: share/man/man4/tcp.4 > =================================================================== > --- share/man/man4/tcp.4 (revision 245742) > +++ share/man/man4/tcp.4 (working copy) > @@ -205,6 +205,18 @@ > in the > .Sx MIB Variables > section further down. > +.It Dv TCP_IGNOREIDLE > +If a TCP connection is idle for more than one retransmit timeout, > +it enters slow start when new data is available to transmit. > +This avoids flooding the network with a full window of traffic at line rate. > +It also allows the connection to adjust to changes to network conditions > +that occurred while the connection was idle. A connection that sends > +bursts of data separated by large idle periods can be permamently stuck in > +slow start as a result. > +The boolean option > +.Dv TCP_IGNOREIDLE > +disables the idle connection handling allowing connections to maintain the > +existing congestion window when restarting after an idle period. > .It Dv TCP_NODELAY > Under most circumstances, > .Tn TCP > Index: sys/netinet/tcp_var.h > =================================================================== > --- sys/netinet/tcp_var.h (revision 245742) > +++ sys/netinet/tcp_var.h (working copy) > @@ -230,6 +230,7 @@ > #define TF_NEEDFIN 0x000800 /* send FIN (implicit state) */ > #define TF_NOPUSH 0x001000 /* don't push */ > #define TF_PREVVALID 0x002000 /* saved values for bad rxmit valid */ > +#define TF_IGNOREIDLE 0x004000 /* connection is never idle */ > #define TF_MORETOCOME 0x010000 /* More data to be appended to sock */ > #define TF_LQ_OVERFLOW 0x020000 /* listen queue overflow */ > #define TF_LASTIDLE 0x040000 /* connection was previously idle */ > Index: sys/netinet/tcp_output.c > =================================================================== > --- sys/netinet/tcp_output.c (revision 245742) > +++ sys/netinet/tcp_output.c (working copy) > @@ -206,7 +206,8 @@ > * to send, then transmit; otherwise, investigate further. > */ > idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una); > - if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur) > + if (!(tp->t_flags & TF_IGNOREIDLE) && > + idle && ticks - tp->t_rcvtime >= tp->t_rxtcur) > cc_after_idle(tp); > tp->t_flags &= ~TF_LASTIDLE; > if (idle) { > Index: sys/netinet/tcp.h > =================================================================== > --- sys/netinet/tcp.h (revision 245823) > +++ sys/netinet/tcp.h (working copy) > @@ -156,6 +156,7 @@ > #define TCP_NODELAY 1 /* don't delay send to coalesce packets */ > #if __BSD_VISIBLE > #define TCP_MAXSEG 2 /* set maximum segment size */ > +#define TCP_IGNOREIDLE 3 /* disable idle connection handling */ > #define TCP_NOPUSH 4 /* don't push last block of write */ > #define TCP_NOOPT 8 /* don't use TCP options */ > #define TCP_MD5SIG 16 /* use MD5 digests (RFC2385) */ > Index: sys/netinet/tcp_usrreq.c > =================================================================== > --- sys/netinet/tcp_usrreq.c (revision 245742) > +++ sys/netinet/tcp_usrreq.c (working copy) > @@ -1354,6 +1354,7 @@ > > case TCP_NODELAY: > case TCP_NOOPT: > + case TCP_IGNOREIDLE: > INP_WUNLOCK(inp); > error = sooptcopyin(sopt, &optval, sizeof optval, > sizeof optval); > @@ -1368,6 +1369,9 @@ > case TCP_NOOPT: > opt = TF_NOOPT; > break; > + case TCP_IGNOREIDLE: > + opt = TF_IGNOREIDLE; > + break; > default: > opt = 0; /* dead code to fool gcc */ > break; > @@ -1578,6 +1582,11 @@ > INP_WUNLOCK(inp); > error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX); > break; > + case TCP_IGNOREIDLE: > + optval = tp->t_flags & TF_IGNOREIDLE; > + INP_WUNLOCK(inp); > + error = sooptcopyout(sopt, &optval, sizeof optval); > + break; > default: > INP_WUNLOCK(inp); > error = ENOPROTOOPT; > > -- > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- Tomorrow Will Never Die