From owner-freebsd-net Wed Aug 1 23: 8:59 2001 Delivered-To: freebsd-net@freebsd.org Received: from technokratis.com (modemcable052.174-202-24.mtl.mc.videotron.ca [24.202.174.52]) by hub.freebsd.org (Postfix) with ESMTP id C72ED37B405; Wed, 1 Aug 2001 23:08:54 -0700 (PDT) (envelope-from bmilekic@technokratis.com) Received: (from bmilekic@localhost) by technokratis.com (8.11.4/8.11.3) id f726Ack04317; Thu, 2 Aug 2001 02:10:38 -0400 (EDT) (envelope-from bmilekic) Date: Thu, 2 Aug 2001 02:10:38 -0400 From: Bosko Milekic To: stanislav shalunov Cc: Bill Paul , Ken Merry , freebsd-net@FreeBSD.ORG Subject: Re: TCP problems with large window sizes on FreeBSD (GigaTCP) Message-ID: <20010802021038.A4181@technokratis.com> References: <87bslzt9v6.fsf@cain.internet2.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <87bslzt9v6.fsf@cain.internet2.edu>; from shalunov@internet2.edu on Wed, Aug 01, 2001 at 11:32:29PM -0400 Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hi Stanislav, On Wed, Aug 01, 2001 at 11:32:29PM -0400, stanislav shalunov wrote: > We want to build two or more machines that would be capable of > achieving TCP throughputs of 700-800Mb/s over WAN (with a single TCP > connection). The motivations of this exercise are spelled out on the > referred web page. Additionally, I believe that getting through with > this exercise with FreeBSD as the OS would advance FreeBSD's cause > with network researchers and advanced users of high-performance > networks. > > In order to run at such high throughput over links with RTT of roughly > 70ms we'd need window sizes in the vicinity of 8-16MB. (And, > naturally, the unidirectional loss event probability has to be less > than (.7*MSS/(RTT*bandwidth))^2 = 1e-7. We believe that we have > networks that lose less than one packet in ten million.) > > We have built the boxes now. I have started with back-to-back testing > with large window sizes. Back-to-back testing is believed to be valid > because it's hard to expect that inserting 70ms delay between the host > will make the situation any better. > > I cannot get it to run with window sizes greater than half a megabyte. > > The story, with some very preliminary analysis, is at > http://www.internet2.edu/~shalunov/gigatcp/ > > I'm not reposting it here; there are 29KB of text and 3MB of data > there. I'm adding and updating stuff as I progress. > > The questions that I have for you guys are, in decreasing order of > importance: > > 1. How do I fix the ti driver problem that apparently is holding me > back? What number of jumbo slots would be "good"? I would recommend, seeing as how you're targeting specifically TCP throughput and are not too concerned with lack of physical memory, to increase TI_JSLOTS to at least 500-600. This would have the effect of reserving roughly 5M of physically contiguous memory during driver attachment, which I think you can safely spare. If you discover that you need even more than 500-600 jumbo buffers, feel free to experiment with the TI_JSLOTS constant. For what concerns memory buffer tuning, I would also recommend, to be on the safe side, the following changes: - in uipc_mbuf.c, increase NCL_INIT to roughly 20. Seeing as how you're using if_ti which allocates its own buffer space, I don't suspect that you'll be needing many regular clusters. Check with `netstat -m' to see how many are typically in use during a test and increase NCL_INIT to that number. All this will do is pre-allocate the pool at boot time and avoid potentially expensive map allocations while your tests are running. - also in uipc_mbuf.c, increase NMB_INIT to 10240, or any sensible value you have detected is required (again, see with `netstat -m' during testing, try to determine the maximum number of mbufs you'll need). All this will do is, again, allocate them at boot time and speed up memory buffer allocations during performance testing. If you set the number to 10240, remember that each mbuf is merely 256 bytes, so you'll be giving up a mere ~2.7M for the cause, and speeding up allocations altogether. Finally, I noticed at one point in your analysis that you increased NMBCLUSTERS. You'll find that, unless you're actually running out of mbufs and/or clusters, that increasing N{MB,CL}_INIT is probably what you want to do instead. > 2. Why doesn't Fast Retransmit kick in? (See the annotated sender's > view of the stalled connection.) > > 3. Is there an off-by-one error in RST handling? (See the end of the > annotated receiver's view of the stalled connection.) I believe jlemon covered these two issues in his post, which very much makes sense as he's the overall stack guru. :-) > -- > Stanislav Shalunov http://www.internet2.edu/~shalunov/ > > "Nuclear war would really set back cable [television]." -- Ted Turner -- Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message