From owner-freebsd-questions@FreeBSD.ORG Thu Feb 26 22:53:27 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 406C21065673 for ; Thu, 26 Feb 2009 22:53:27 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from asmtpout017.mac.com (asmtpout017.mac.com [17.148.16.92]) by mx1.freebsd.org (Postfix) with ESMTP id 297798FC12 for ; Thu, 26 Feb 2009 22:53:25 +0000 (UTC) (envelope-from cswiger@mac.com) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Received: from cswiger1.apple.com ([17.227.140.124]) by asmtp017.mac.com (Sun Java(tm) System Messaging Server 6.3-7.03 (built Aug 7 2008; 32bit)) with ESMTPSA id <0KFP00ETW3KLHL60@asmtp017.mac.com> for freebsd-questions@freebsd.org; Thu, 26 Feb 2009 14:53:11 -0800 (PST) Message-id: From: Chuck Swiger To: ross.cameron@linuxpro.co.za In-reply-to: <35f70db10902261341g18d1840du3eb2548418f39974@mail.gmail.com> Date: Thu, 26 Feb 2009 14:53:09 -0800 References: <35f70db10902260013v25e3f1bfs8f5929d2c62805@mail.gmail.com> <35f70db10902261302y41d6e9e0x9f420dd6f589735b@mail.gmail.com> <35f70db10902261341g18d1840du3eb2548418f39974@mail.gmail.com> X-Mailer: Apple Mail (2.930.3) Cc: "freebsd-questions@freebsd.org" Subject: Re: TCP congestion avoidance X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Feb 2009 22:53:27 -0000 On Feb 26, 2009, at 1:41 PM, Ross Cameron wrote: > Where can I find more documentation on these types of settings in > FreeBSD The FreeBSD Handbook and Google will help for the general case, but for specific details, reading the source is recommended. > and > How can I choose between more than just TCP_NewReno, specifically I > will be making use of TCP_Westwood / TCP_Westwood+ and > TCP_Illinois ??? If you have a BSD-licensed implementation of TCP Westwood or the others handy, feel free to contribute your patches in a PR. At least some of the notions behind the congestion algorithms you've mentioned are present in the FreeBSD stack in the form of the net.inet.tcp.inflight tunables; see netinet/tcp_subr.c: /* * TCP BANDWIDTH DELAY PRODUCT WINDOW LIMITING * * This code attempts to calculate the bandwidth-delay product as a * means of determining the optimal window size to maximize bandwidth, * minimize RTT, and avoid the over-allocation of buffers on interfaces and * routers. This code also does a fairly good job keeping RTTs in check * across slow links like modems. We implement an algorithm which is very * similar (but not meant to be) TCP/Vegas. The code operates on the * transmitter side of a TCP connection and so only effects the transmit * side of the connection. * * BACKGROUND: TCP makes no provision for the management of buffer space * at the end points or at the intermediate routers and switches. A TCP * stream, whether using NewReno or not, will eventually buffer as * many packets as it is able and the only reason this typically works is * due to the fairly small default buffers made available for a connection * (typicaly 16K or 32K). As machines use larger windows and/or window * scaling it is now fairly easy for even a single TCP connection to blow-out * all available buffer space not only on the local interface, but on * intermediate routers and switches as well. NewReno makes a misguided * attempt to 'solve' this problem by waiting for an actual failure to occur, * then backing off, then steadily increasing the window again until another * failure occurs, ad-infinitum. This results in terrible oscillation that * is only made worse as network loads increase and the idea of intentionally * blowing out network buffers is, frankly, a terrible way to manage network * resources. * * It is far better to limit the transmit window prior to the failure * condition being achieved. There are two general ways to do this: First * you can 'scan' through different transmit window sizes and locate the * point where the RTT stops increasing, indicating that you have filled the * pipe, then scan backwards until you note that RTT stops decreasing, then * repeat ad-infinitum. This method works in principle but has severe * implementation issues due to RTT variances, timer granularity, and * instability in the algorithm which can lead to many false positives and * create oscillations as well as interact badly with other TCP streams * implementing the same algorithm. * * The second method is to limit the window to the bandwidth delay product * of the link. This is the method we implement. RTT variances and our * own manipulation of the congestion window, bwnd, can potentially * destabilize the algorithm. For this reason we have to stabilize the * elements used to calculate the window. We do this by using the minimum * observed RTT, the long term average of the observed bandwidth, and * by adding two segments worth of slop. It isn't perfect but it is able * to react to changing conditions and gives us a very stable basis on * which to extend the algorithm. */ void tcp_xmit_bandwidth_limit() -- -Chuck