From owner-freebsd-hackers Sun Jul 15 10: 5:27 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 3B71B37B407 for ; Sun, 15 Jul 2001 10:05:23 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.4/8.11.2) id f6FH5Gd08326; Sun, 15 Jul 2001 10:05:16 -0700 (PDT) (envelope-from dillon) Date: Sun, 15 Jul 2001 10:05:16 -0700 (PDT) From: Matt Dillon Message-Id: <200107151705.f6FH5Gd08326@earth.backplane.com> To: Leo Bicknell Cc: Julian Elischer , Leo Bicknell , Drew Eckhardt , hackers@FreeBSD.ORG Subject: Re: Network performance tuning. References: <200107130128.f6D1SFE59148@earth.backplane.com> <200107130217.f6D2HET67695@revolt.poohsticks.org> <20010712223042.A77503@ussenterprise.ufp.org> <200107131708.f6DH8ve65071@earth.backplane.com> <3B515097.6551A530@elischer.org> <20010715103334.A64293@ussenterprise.ufp.org> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :Packet loss is not always a bad thing. Let me use an admittedly :extreme example: : :Consider a backup server across country from four machines it's :trying to back up nightly. So we have high (let's say 70ms) RTT's, :and let's say for the sake of argument the limiting factor is a :DS-3 in the middle, 45 MBits/sec. : :Each connection can get 16384 * 1000 / 70 = 234057 bytes/sec, or :about 1.87 Mbits/sec. Multiply by the 4 machines, and we get :network utilization of 7.48 Mbits/sec, about 16% of the DS-3. : :Now, we implement some sort of code that can increase the amount :of socket buffering space. As a result, the window can grow (per :connection) large enough to fill a DS-3, so the 4 hosts must fight :for the bandwidth available. : :I don't have any great math for how we get here, but TCP in normal :situations rarely produces more than 5% packet loss (10% absolute :max), since it backs off when congestion occurs. I'll go with 5% :as an upper bound. With that packet loss, TCP now gets the DS-3 :much closer to full, let's say 90%, or 40.5 Mbits/sec (it should :be higher than 90%, but again, I'm worst casing). In the aggregate :that will be spread across the 4 connections evenly, or 10.12 :Mbits/sec per connection. : :The question to be asked is, which is better, 1.87 MBit's sec with :no packet loss, or 10.12 Mbits/sec w/5% packet loss. Clearly the :latter gives better performance, even with packet loss. : :Clearly knowing the end to end link bandwidth and 'just' filling it :would be better, but packet loss, at least in the concept of TCP :flow control is not all bad. Something else to remember is not :everyone plays fair, so if we stay to 80% of available, and everyone :else pushes to packet loss we will in general be pushed out. : :-- :Leo Bicknell - bicknell@ufp.org Well, 4 connections isn't enough to generate packet loss. All that happens is that routers inbetween start buffering the packets. If you had a *huge* tcp window size then the routers inbetween could run out of packet space and then packet loss would start to occur. Routers tend to have a lot of buffer space, though. The real killer is run-away latencies rather then packet loss. On the other hand, something like the experimental bandwidth delay product code I posted would do very well running 4 connections over such a link, because it would detect the point where the routers start buffering the data (by noticing the increased latency) and back-off before the packet loss occured. It doesn't care how many connections are running in parallel. The downside is that the algorithm becomes less stable as you increase the number of connections going between the same two end points. The stability in the face of lots of parallel connections is something that needs to be tested. Also, the algorithm is less helpful when it has to figure out the optimal transmit buffer size for every new connection (consider a web server). I am considering ripping out the ssthresh junk from the stack, which does not work virtually at all, and using the route table's ssthresh field to set the initial buffer size for the algorithm. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message