Date: Thu, 14 Feb 2013 01:26:38 +1100 From: Lawrence Stewart <lstewart@freebsd.org> To: Andre Oppermann <andre@freebsd.org> Cc: John Baldwin <jhb@freebsd.org>, net@freebsd.org Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option Message-ID: <511BA29E.5050501@freebsd.org> In-Reply-To: <511B6A87.5060000@freebsd.org> References: <201301221511.02496.jhb@freebsd.org> <511B4DEF.8000500@freebsd.org> <511B6A87.5060000@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 02/13/13 21:27, Andre Oppermann wrote: > On 13.02.2013 09:25, Lawrence Stewart wrote: >> FYI I've read the whole thread as of this reply and plan to follow up to >> a few of the other posts separately, but first for my initial thoughts... >> >> On 01/23/13 07:11, John Baldwin wrote: >>> As I mentioned in an earlier thread, I recently had to debug an issue >>> we were >>> seeing across a link with a high bandwidth-delay product (both high >>> bandwidth >>> and high RTT). Our specific use case was to use a TCP connection to >>> reliably >>> forward a latency-sensitive datagram stream across a WAN connection. >>> We would >>> often see spikes in the latency of individual datagrams. I >>> eventually tracked >>> this down to the connection entering slow start when it would >>> transmit data >>> after being idle. The data stream was quite bursty and would often >>> attempt to >>> transmit a burst of data after being idle for far longer than a >>> retransmit >>> timeout. >> >> Got it. >> >>> In 7.x we had worked around this in the past by disabling RFC 3390 >>> and jacking >>> the slow start window size up via a sysctl. On 8.x this no longer >>> worked. >> >> I can't think of, nor have I read any convincing argument why we >> shouldn't support your use case out of the box. You're not the only user >> of FreeBSD over dedicated lines who knows what you're doing. We should >> provide some way to support this use case. >> >> We're therefore left with the question of how to implement this. >> >> As noted in the "Some questions about the new TCP congestion control >> code" thread [1], it was always my intention to axe the ss_flightsize >> variables and replace them with a better mechanism. Andre swung the axe >> before I did and 10.x is looming so it's a good time to discuss all of >> this. >> >>> The solution I came up with was to add a new socket option to disable >>> idle >>> handling completely. That is, when an idle connection restarts with >>> this new >>> option enabled, it keeps its current congestion window and doesn't >>> enter slow >>> start. >> >> rwatson@ mentioned an idea in private discussion which I've also thought >> about over the years. The real goal here should be to subsume your use >> case (and others) into a much richer framework for hinting desired >> behaviour/tradeoff preferences (some aspects of which relate to parts of >> my PhD work, which will hopefully be coming to a kernel near you in >> 2013 ;). >> >> My main concern with your patch is that I'm a bit uneasy about >> enshrining a socket option in a public API and documentation that is so >> specific. I suspect apps probably want to set higher level goals like >> "low latency *at any cost*" and have the stack opaquely interpret that >> as "this guy is willing to blow his foot off, so let's disable idle >> window reset, tweak X, disable Y and hand the man his loaded shotgun". >> TCP_IGNOREIDLE as currently proposed misses this bigger picture, though >> doesn't preclude it either. >> >> I would also echo Kevin/Grenville's thoughts about keying the socket >> option's activation off a tunable (sysctl or kernel option is up for >> discussion, though I'd be leaning towards sysctl) that is disabled by >> default i.e. only skip after idle window reset if the app sets the >> option *and* the sysadmin has pulled the "I like me some bursty network" >> lever. >> >>> There are only a few cases where such an option is useful, but if >>> anyone else >>> thinks this might be useful I'd be happy to add the option to FreeBSD. >> >> The idea is useful. I'd just like to discuss the implementation >> specifics a little further before recommending whether the patch should >> go in as is to provide a stop gap, or we rework the patch to be a little >> less specific in readiness for the future work I have in mind. > > Again I'd like to point out that this sort of modification should > be implemented as a congestion control module. All the hook points > are already there and can readily be used instead of adding more special > cases to the generic part of TCP. The CC algorithm can be selected per > socket. For such a special CC module it'd get a nice fat warning that > it is not suitable for Internet use. As a local hack, sure, a CC module would do the job assuming you were happy to use a single algorithm as the base. John's patch transcends the algorithm in use on a particular connection, so it has wider applicability than a CC module. I would also strongly oppose the inclusion of such a module in FreeBSD proper - it's the wrong way to implement the functionality. The patch as posted is technically appropriate, though I'm interested in discussing whether the public API should be tweaked to capture higher level goals instead e.g. "low delay at all costs" or "maximum throughput". We could initially map "low delay at all costs" to a TCP stack meaning of "disable idle window reset" and expand the meaning later (e.g. relaxing the silly window checks as briefly discussed in the other thread). > Additionally I speculate that for the use-case of John he may also be > willing to forgo congestion avoidance and always operate in (ill-named) > "slow start" mode. With a special CC module this can easily be tweaked. John already has the functionality he needs in this local tree - this discussion is no longer about John per se, but rather about other people who may want the functionality John has implemented. We need to figure out how to provide the functionality in FreeBSD proper, and a CC module is not the answer. Cheers, Lawrence
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?511BA29E.5050501>