From owner-freebsd-hackers Fri Jul 13 11:59:40 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from harrier.mail.pas.earthlink.net (harrier.mail.pas.earthlink.net [207.217.121.12]) by hub.freebsd.org (Postfix) with ESMTP id 0819F37B406 for ; Fri, 13 Jul 2001 11:59:35 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from mindspring.com (dialup-209.245.130.157.Dial1.SanJose1.Level3.net [209.245.130.157]) by harrier.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id LAA29708; Fri, 13 Jul 2001 11:59:28 -0700 (PDT) Message-ID: <3B4F4534.37D8FC3E@mindspring.com> Date: Fri, 13 Jul 2001 12:00:04 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Leo Bicknell Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Network performance roadmap. References: <20010713101107.B9559@ussenterprise.ufp.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Leo Bicknell wrote: > 1) FreeBSD's TCP windows cannot grow large enough to allow for > optimum performance. The primary obstical to raising them is > that if you do so, the system can run out of MBUF's. Schemes > need to be put in place to limit MBUF usage, and better allocate > buffers per connection. Not quite true. They are administratively limited, because of the artificial fixed ratio of mbufs to clusters. This is a design problem, not a physical limitation. > 2) Windows are currently 16k. It seems a wide number of people > think 32k would not cause major issues, and is in fact in use > by many other OS's at this time. The main reason for this is that other OS's use system buffers for jumbograms. Please check the Tigon II and Intel Gigabit drivers, and you will see that FreeBSD does not do this. Jumbo buffers are seperate. People having performance issues with Jumbo should consider setting their MTU to 8k instead of 9k, to cause them to become an even multiple of mbuf size. > There are a few other observations that have been made that are > important. > > A) The receive buffers are hardly used. In fact, data generally > only sits in a receive buffer for one of two reasons. First, > the data has not yet been passed to the application. This amount of > data is generally very small. Second, data for unacknowledged > segments will sit in the buffer waiting for a retransmit. It is of > course possible that the buffers could be completely full from either > case, but several research papers indicate that receive buffers > rarely use much space at all. You need to read the WRL and Rice University papers, then, and pay particular attention to "livelock". > B) When the system runs out of MBUF's, really bad things happen. It > would be nice to make the system handle MBUF exhaustion in a nicer > way, or avoid it. The easiest way to do this is to know ahead of time how many you _really_ have. Then bad things don't happen. > C) Many people think TCP_EXTENSIONS="YES" gives them windows > 64k. > It does, in the sense that it allows the window scale option, but > it doesn't in that socket buffers aren't changed. Socket buffers are set at boot time. Read the code. Same for maximum number of connections: you can hop around until you are blue in the face from typing "sysctl", but it will not change the number of tcpcb's and inpcb's, etc.. This is an artifact of the allocator. > >From all of this, I propose the following short term road map: > > a - Commit higher socket buffer sizes: > > -current: 64k receive (based on observation A) > 32k send (based on detail 2) > > -stable: 32k receive (based on detail 2) > 32k send (based on detail 2) > > I think this can be done more or less immediately. This would suck. It would halve your maximum number of concurrent connections on servers with differential rates on the connections (e.g. my local connection is 1Gbit, but the other end is on a 28K modem). Your send windows will always remain full. Having larger transmit windows is really dependent on the type of traffic you expect to serve; in the HTTP case, the studies indicate that the majority of objects served are less than 8k in size. Most browsers (except Opera) do not suport PIPELINING. You would be well served to do this on a test system, and do watermark connection counting on a lot of traffic (e.g. how many connections get to 1k, 2k, 4k, 8k, 16k, 32k of data buffered in the send window; do the count when the connection closes, based on the high watermark -- you can put it in the socket struct, which is bloated to 192 bytes by the allocator's 64 byte alignment property anyway, so you have some headroom for keeping these stats). Only after you have proven that some significant fraction of traffic actually ends up hitting the window size limits, should you make this change to FreeBSD proper. If anyone is interested in doing this and writing a paper, you can probably build a nice Master's Thesis on the study as a fast-track to getting your Master's, since it would probably take less than two weeks to do the whole thing, and most of that would be in waiting for the traffic data to get collected. > b - Allow larger receive windows in some cases. In -current > only, if TCP_EXTENSIONS="YES" is configured (turn on RFC1323 > extensions) change the settings to: > > 1M kernel limit (based on observation C) > 256k receive (based on observation A, C) > 64k send (based on observation C) > > Note, 64k send is most likely agressive with the current MBUF > problems. Some later points will address that. For now, the > basic assumption is that people configuring TCP_EXTENSIONS are > clueful people with larger memory machines who also tune things like > MAXUSERS up, so they will probably be ok. You can bump the default max, but should not bump the default itself, unless it is requested by a program (e.g. maintain a soft limit based on socket options for experimental programs). I think you will find that it's a bad idea. > c - Prevent MBUF exhaustion. Today, when you run out of MBUF's, bad > things start to happen. It would be nice to prevent that from > happening, and also to provide sysadmins some warning when it is > about to happen. One good way to prevent this is to not unreasonably set your window size... 8-p. > This change sounds easy, but I don't know where in the code to start > looking. Basically, there is a bit of code somewhere that decides > if a sending TCP process should block or not. Today this code only > looks to see if that socket's TCP send buffer is full. What I > propose is that it should also check if less than 10% of the MBUF's > are free, and if so also block the sender. Ugh. If you don't know where to start looking in the code, this is definitely research that should not be done in the context of committing changes to the main FreeBSD tree, until you have your answers on whether the changes actually improve or degrade performance. I think you need to do a seperate research project; start with the SCALA Server papers, and the WRL papers as references. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message