Date: Tue, 22 Jan 2013 12:35:40 -0800 From: Alfred Perlstein <bright@mu.org> To: John Baldwin <jhb@freebsd.org> Cc: net@freebsd.org Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option Message-ID: <50FEF81C.1070002@mu.org> In-Reply-To: <201301221511.02496.jhb@freebsd.org> References: <201301221511.02496.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 1/22/13 12:11 PM, John Baldwin wrote: > As I mentioned in an earlier thread, I recently had to debug an issue we were > seeing across a link with a high bandwidth-delay product (both high bandwidth > and high RTT). Our specific use case was to use a TCP connection to reliably > forward a latency-sensitive datagram stream across a WAN connection. We would > often see spikes in the latency of individual datagrams. I eventually tracked > this down to the connection entering slow start when it would transmit data > after being idle. The data stream was quite bursty and would often attempt to > transmit a burst of data after being idle for far longer than a retransmit > timeout. > > In 7.x we had worked around this in the past by disabling RFC 3390 and jacking > the slow start window size up via a sysctl. On 8.x this no longer worked. > The solution I came up with was to add a new socket option to disable idle > handling completely. That is, when an idle connection restarts with this new > option enabled, it keeps its current congestion window and doesn't enter slow > start. > > There are only a few cases where such an option is useful, but if anyone else > thinks this might be useful I'd be happy to add the option to FreeBSD. This looks good, but it almost sounds like a bug for TCP to be doing this anyhow. Why would one want this behavior? Wouldn't it make sense to keep the window large until there was a problem rather than unconditionally chop it down? I almost think TCP is afraid that you might wind up swapping out a 10gig interface for a modem? I'm just not getting it. (probably simple oversight on my part). What do you think about also making this a sysctl for global on/off by default? -Alfred > > Index: share/man/man4/tcp.4 > =================================================================== > --- share/man/man4/tcp.4 (revision 245742) > +++ share/man/man4/tcp.4 (working copy) > @@ -205,6 +205,18 @@ > in the > .Sx MIB Variables > section further down. > +.It Dv TCP_IGNOREIDLE > +If a TCP connection is idle for more than one retransmit timeout, > +it enters slow start when new data is available to transmit. > +This avoids flooding the network with a full window of traffic at line rate. > +It also allows the connection to adjust to changes to network conditions > +that occurred while the connection was idle. A connection that sends > +bursts of data separated by large idle periods can be permamently stuck in > +slow start as a result. > +The boolean option > +.Dv TCP_IGNOREIDLE > +disables the idle connection handling allowing connections to maintain the > +existing congestion window when restarting after an idle period. > .It Dv TCP_NODELAY > Under most circumstances, > .Tn TCP > Index: sys/netinet/tcp_var.h > =================================================================== > --- sys/netinet/tcp_var.h (revision 245742) > +++ sys/netinet/tcp_var.h (working copy) > @@ -230,6 +230,7 @@ > #define TF_NEEDFIN 0x000800 /* send FIN (implicit state) */ > #define TF_NOPUSH 0x001000 /* don't push */ > #define TF_PREVVALID 0x002000 /* saved values for bad rxmit valid */ > +#define TF_IGNOREIDLE 0x004000 /* connection is never idle */ > #define TF_MORETOCOME 0x010000 /* More data to be appended to sock */ > #define TF_LQ_OVERFLOW 0x020000 /* listen queue overflow */ > #define TF_LASTIDLE 0x040000 /* connection was previously idle */ > Index: sys/netinet/tcp_output.c > =================================================================== > --- sys/netinet/tcp_output.c (revision 245742) > +++ sys/netinet/tcp_output.c (working copy) > @@ -206,7 +206,8 @@ > * to send, then transmit; otherwise, investigate further. > */ > idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una); > - if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur) > + if (!(tp->t_flags & TF_IGNOREIDLE) && > + idle && ticks - tp->t_rcvtime >= tp->t_rxtcur) > cc_after_idle(tp); > tp->t_flags &= ~TF_LASTIDLE; > if (idle) { > Index: sys/netinet/tcp.h > =================================================================== > --- sys/netinet/tcp.h (revision 245823) > +++ sys/netinet/tcp.h (working copy) > @@ -156,6 +156,7 @@ > #define TCP_NODELAY 1 /* don't delay send to coalesce packets */ > #if __BSD_VISIBLE > #define TCP_MAXSEG 2 /* set maximum segment size */ > +#define TCP_IGNOREIDLE 3 /* disable idle connection handling */ > #define TCP_NOPUSH 4 /* don't push last block of write */ > #define TCP_NOOPT 8 /* don't use TCP options */ > #define TCP_MD5SIG 16 /* use MD5 digests (RFC2385) */ > Index: sys/netinet/tcp_usrreq.c > =================================================================== > --- sys/netinet/tcp_usrreq.c (revision 245742) > +++ sys/netinet/tcp_usrreq.c (working copy) > @@ -1354,6 +1354,7 @@ > > case TCP_NODELAY: > case TCP_NOOPT: > + case TCP_IGNOREIDLE: > INP_WUNLOCK(inp); > error = sooptcopyin(sopt, &optval, sizeof optval, > sizeof optval); > @@ -1368,6 +1369,9 @@ > case TCP_NOOPT: > opt = TF_NOOPT; > break; > + case TCP_IGNOREIDLE: > + opt = TF_IGNOREIDLE; > + break; > default: > opt = 0; /* dead code to fool gcc */ > break; > @@ -1578,6 +1582,11 @@ > INP_WUNLOCK(inp); > error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX); > break; > + case TCP_IGNOREIDLE: > + optval = tp->t_flags & TF_IGNOREIDLE; > + INP_WUNLOCK(inp); > + error = sooptcopyout(sopt, &optval, sizeof optval); > + break; > default: > INP_WUNLOCK(inp); > error = ENOPROTOOPT; >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50FEF81C.1070002>