From owner-freebsd-net@FreeBSD.ORG Wed Feb 13 14:01:23 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CC511157; Wed, 13 Feb 2013 14:01:23 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id 3B8C03E5; Wed, 13 Feb 2013 14:01:23 +0000 (UTC) Received: from lstewart1.loshell.room52.net (ppp59-167-184-191.static.internode.on.net [59.167.184.191]) by lauren.room52.net (Postfix) with ESMTPSA id 11C387E824; Thu, 14 Feb 2013 01:01:21 +1100 (EST) Message-ID: <511B9CB0.4090906@freebsd.org> Date: Thu, 14 Feb 2013 01:01:20 +1100 From: Lawrence Stewart User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120613 Thunderbird/13.0 MIME-Version: 1.0 To: Kevin Oberman Subject: Re: [PATCH] Add a new TCP_IGNOREIDLE socket option References: <201301221511.02496.jhb@freebsd.org> <50FF06AD.402@networx.ch> <061B4EA5-6A93-48A0-A269-C2C3A3C7E77C@lakerest.net> <201302060746.43736.jhb@freebsd.org> <511292C9.4040307@mu.org> <51166019.9040104@mu.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY autolearn=unavailable version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net Cc: Alfred Perlstein , Randall Stewart , John Baldwin , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Feb 2013 14:01:23 -0000 On 02/10/13 16:05, Kevin Oberman wrote: > On Sat, Feb 9, 2013 at 6:41 AM, Alfred Perlstein wrote: >> On 2/7/13 12:04 PM, George Neville-Neil wrote: >>> >>> On Feb 6, 2013, at 12:28 , Alfred Perlstein wrote: >>> >>>> On 2/6/13 4:46 AM, John Baldwin wrote: >>>>> >>>>> On Wednesday, February 06, 2013 6:27:04 am Randall Stewart wrote: >>>>>> >>>>>> John: >>>>>> >>>>>> A burst at line rate will *often* cause drops. This is because >>>>>> router queues are at a finite size. Also such a burst (especially >>>>>> on a long delay bandwidth network) cause your RTT to increase even >>>>>> if there is no drop which is going to hurt you as well. >>>>>> >>>>>> A SHOULD in an RFC says you really really really really need to do it >>>>>> unless there is some thing that makes you willing to override it. It is >>>>>> slight wiggle room. >>>>>> >>>>>> In this I agree with Andre, we should not be *not* doing it. Otherwise >>>>>> folks will be turning this on and it is plain wrong. It may be fine >>>>>> for your network but I would not want to see it in FreeBSD. >>>>>> >>>>>> In my testing here at home I have put back into our stack max-burst. >>>>>> This >>>>>> uses Mark Allman's version (not Kacheong Poon's) where you clamp the >>>>>> cwnd at >>>>>> no more than 4 packets larger than your flight. All of my testing >>>>>> high-bw-delay or lan has shown this to improve TCP performance. This >>>>>> is because it helps you avoid bursting out so many packets that you >>>>>> overflow >>>>>> a queue. >>>>>> >>>>>> In your long-delay bw link if you do burst out too many (and you never >>>>>> know how many that is since you can not predict how full all those >>>>>> MPLS queues are or how big they are) you will really hurt yourself even >>>>>> worse. >>>>>> Note that generally in Cisco routers the default queue size is >>>>>> somewhere between >>>>>> 100-300 packets depending on the router. >>>>> >>>>> Due to the way our application works this never happens, but I am fine >>>>> with >>>>> just keeping this patch private. If there are other shops that need >>>>> this they >>>>> can always dig the patch up from the archives. >>>>> >>>> This is yet another time when I'm sad about how things happen in FreeBSD. >>>> >>>> A developer come forward with a non-default option that's very useful for >>>> some specific workloads, specifically one that contributes much time and $$$ >>>> to the project and the community rejects the patches even though it's been >>>> successful in other OSes. >>>> >>>> It makes zero sense. >>>> >>>> John, can you repost the patch? Maybe there is a way to refactor this >>>> somehow so it's like accept filters where we can plug in a hook for TCP? >>>> >>>> I am very disappointed, but not surprised. >>>> >>> I take away the complete opposite feeling. This is how we work through >>> these issues. >>> It's clear from the discussion that this need not be a default in the >>> system, >>> and is a special case. We had a reasoned discussion of what would be best >>> to do >>> and at least two experts in TCP weighed in on the effect this change might >>> have. >>> >>> Not everything proposed by a developer need go into the tree, in >>> particular since these >>> discussions are archived we can always revisit this later. >>> >>> This is exactly how collaborative development should look, whether or not >>> the patch >>> is integrated now, next week, next year, or ever. >> >> >> I agree that discussion is great, we have all learned quite a bit from it, >> about TCP and the dangers of adjusting buffering without considerable >> thought. I would not be involved in FreeBSD had this type of discussion and >> information not be discussed on the lists so readily. >> >> However, the end result must be far different than what has occurred so far. >> >> If the code was deemed unacceptable for general inclusion, then we must find >> a way to provide a light framework to accomplish the needs of the community >> member. >> >> Take for instance someone who is starting a company that needs this >> facility. Which OS will they choose? One who has integrated a useful >> feature? Or one who has rejected it and left that code in the mailing list >> archives? >> >> As much as expert opinion is valuable, it must include understanding and >> need of handling special cases and the ability to facilitate those special >> cases for our users and developers. > > This is a subject rather near to my heart, having fought battles with > congestion back in the dark days of Windows when it essentially > defaulted to TCPIGNOREIDLE. It was a huge pain, but it was the only > way Windows did TCP in the early days. It simply did not implement > slow-start. This was really evil, but in the days when lots of links > were 56K and T-1 was mostly used for network core links, the Internet, > small as it was back then, did not melt, though it glowed a > frightening shade of red fairly often. Today too many systems running > like this would melt thins very quickly. > > OTOH, I can certainly see cases, like John's, where it would be very > beneficial. And, yes, Linux has it. (I don't see this a relevant in > any way except as proof tat not enough people have turned it on to > cause serious problems... yet!) It seems a shame to make everyone who > really has a need develop their own patches or dig though old mail to > find John's. > > What I would like to see is a way to have it available, but make it > unlikely to be enabled except in a way that would put up flashing red > warnings and sound sirens to warn people that it is very dangerous and > can be a way to blow off a few of one's own toes. > > One idea that popped into my head (and may be completely ridiculous, > is to make its availability dependent on a kernel option and have > warning in NOTES about it contravening normal and accepted practice > and that it can cause serious problems both for yourself and for > others using the network. Agreed. A sysctl as suggested by Grenville might be sufficient though. Requiring a full kernel recompile seems a bit draconian. > I might also note that almost all higher performance (1G and faster) > networks already have a form of this...TSO. In case you hadn't > noticed, TSO will take a large buffer and transmit it as multiple > segments which are transmitted back to back with NO delay or awareness > of congestion. I can confirm that even this limited case can and does > sometimes result in packet loss when router queues are inadequate to > handle the load. You nailed it - took the words right off my finger tips. Sure, a flow's cwnd can exceed the TSO max chunk size by an order of magnitude, but the fact remains that we live in a bursty world already. As much as I dislike TSO in its current incarnation, it exists for good reason. We need to provide useful tools, thorough documentation and set sensible defaults. Cheers, Lawrence