Date: Mon, 11 Apr 2011 13:41:51 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: arch@freebsd.org Subject: Re: Updating our TCP and socket sysctl values... Message-ID: <201104112041.p3BKfp8n070251@apollo.backplane.com> References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> <20110319160400.000043f5@unknown> <72B8E80C-E4C7-4763-A7B5-7A4441188C00@neville-neil.com> <20110320171122.00004613@unknown>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a little late but I should note that FreeBSD had the inflight limiting code (which I wrote long ago) which could be turned on with a simple sysctl. It was replaced in September 2010 with the 'pluggable congestion control algorithm' which I am unfamiliar with, but which I assume has some equivalent replacement for the functionality. Also I believe FreeBSD turns on autosndbuf and autorcvbuf by default now, which means the buffers ARE being made larger than the defaults (assuming that tcp window shift is not disabled, otherwise you are limited to 65535 bytes no matter how big your buffers are). In anycase, the inflight/congestion-control algorithms essentially remove the buffer bloat issue for tcp transmit buffers. It doesn't matter HOW big your transmit buffer and the receive side's receive buffer is. People running servers need to turn on the inflight or equivalent limiter for several reasons: * BW x DELAY products are all over the map these days, creating surges if packet backlog is not controlled. Reasons for this vary but a good chunk can be blamed on ISPs and network providers who implement burst/backoff schemes (COMCAST is a good example, where your downlink bandwidth is cut in half after around 10 seconds at full bore). * Default tcp buffer sizes are too small. * Auto-sized tcp buffers are often too large (from a buffer bloat perspective). * Edge routers and other routers in the infrastructure have huge amounts of buffer memory these days, so drops don't really start to happen until AFTER the network has become almost unusable. * Turning it on significantly reduces the packet backlog at choke points in the network (usually the edge router), and significantly improves the ability for fair-share and QOS algorithms on the border router to manage traffic. That is, nearly all border routers are going to have some form of QOS, but there is a world of difference having to implement those algorithms for 500 simultanious connections with 3 packets of backlog per connection verses having to do it for 50 packets of backlog per connection. * You don't want to run RED on a border router, RED is designed for the middle of large switching networks. At the edges there are lots of other choices that do not require dropping packets randomly. Plus TCP SACK (which most sites now implement) tends to defeat RED these days, so when RED is used at all it the phrase 'random' becomes equivalent to 'frustration'. * Even if your SERVER has tons of bandwidth, probably a good number of the poor CLIENTS on the other end of the connection do not. If you don't control the backlog guess where all that backlog ends up at? Thats right, it ends up on the client-side border routers... for example, it ends up at the DSLAM if the client is a DSL user. These edge routers are the very last place where you ever want packet backlog to accumulate. They don't handle it well. So, basically that means some sort of transmit-side congestion control needs to be turned on... and frankly should be turned on by default. It just isn't optional any more. I've been screwing around with this stuff for a long time. I have colocated boxes with tons of bw, and servers running out of the house that don't (though ganging the uplink for U-Verse and COMCAST together is actually quite nice). I've played with the issue from both sides of the coin. Clients are a lot happier when servers don't build up 50+ packets of backlog from a single connection (let alone serveral), and routers can manage QOS better when the nearby servers don't blast THEM full of packets. As an example of this, pulling video TCP on my downlink from several concurrent sources over the native COMCAST or U-Verse links tended to create problems when those video streams were from well endowed servers. Eventually the only real solution was to run a VPN to a fast colocated box and run FAIRQ/PF on both sides to control the bandwidth in both directions, simply because there are too many video servers out there which do essentially no bandwidth management at all. Examining the queue showed multiple (video) servers out on the internet would happily build up 50+ packets per connection. Telco and cable providers that most clients are connected to just can't handle it without something blowing up. -Matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201104112041.p3BKfp8n070251>