From owner-freebsd-net  Wed Aug  1 23: 8:59 2001
Delivered-To: freebsd-net@freebsd.org
Received: from technokratis.com (modemcable052.174-202-24.mtl.mc.videotron.ca [24.202.174.52])
	by hub.freebsd.org (Postfix) with ESMTP
	id C72ED37B405; Wed,  1 Aug 2001 23:08:54 -0700 (PDT)
	(envelope-from bmilekic@technokratis.com)
Received: (from bmilekic@localhost)
	by technokratis.com (8.11.4/8.11.3) id f726Ack04317;
	Thu, 2 Aug 2001 02:10:38 -0400 (EDT)
	(envelope-from bmilekic)
Date: Thu, 2 Aug 2001 02:10:38 -0400
From: Bosko Milekic <bmilekic@technokratis.com>
To: stanislav shalunov <shalunov@internet2.edu>
Cc: Bill Paul <wpaul@FreeBSD.ORG>, Ken Merry <ken@FreeBSD.ORG>,
	freebsd-net@FreeBSD.ORG
Subject: Re: TCP problems with large window sizes on FreeBSD (GigaTCP)
Message-ID: <20010802021038.A4181@technokratis.com>
References: <87bslzt9v6.fsf@cain.internet2.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <87bslzt9v6.fsf@cain.internet2.edu>; from shalunov@internet2.edu on Wed, Aug 01, 2001 at 11:32:29PM -0400
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-net.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-net>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-net>
X-Loop: FreeBSD.org


Hi Stanislav,

On Wed, Aug 01, 2001 at 11:32:29PM -0400, stanislav shalunov wrote:
> We want to build two or more machines that would be capable of
> achieving TCP throughputs of 700-800Mb/s over WAN (with a single TCP
> connection).  The motivations of this exercise are spelled out on the
> referred web page.  Additionally, I believe that getting through with
> this exercise with FreeBSD as the OS would advance FreeBSD's cause
> with network researchers and advanced users of high-performance
> networks.
> 
> In order to run at such high throughput over links with RTT of roughly
> 70ms we'd need window sizes in the vicinity of 8-16MB.  (And,
> naturally, the unidirectional loss event probability has to be less
> than (.7*MSS/(RTT*bandwidth))^2 = 1e-7.  We believe that we have
> networks that lose less than one packet in ten million.)
> 
> We have built the boxes now.  I have started with back-to-back testing
> with large window sizes.  Back-to-back testing is believed to be valid
> because it's hard to expect that inserting 70ms delay between the host
> will make the situation any better.
> 
> I cannot get it to run with window sizes greater than half a megabyte.
> 
> The story, with some very preliminary analysis, is at
> http://www.internet2.edu/~shalunov/gigatcp/
> 
> I'm not reposting it here; there are 29KB of text and 3MB of data
> there.  I'm adding and updating stuff as I progress.
> 
> The questions that I have for you guys are, in decreasing order of
> importance:
> 
> 1. How do I fix the ti driver problem that apparently is holding me
>    back?  What number of jumbo slots would be "good"?

	I would recommend, seeing as how you're targeting specifically TCP
throughput and are not too concerned with lack of physical memory, to
increase TI_JSLOTS to at least 500-600. This would have the effect of
reserving roughly 5M of physically contiguous memory during driver attachment,
which I think you can safely spare. If you discover that you need even more
than 500-600 jumbo buffers, feel free to experiment with the TI_JSLOTS
constant.
	For what concerns memory buffer tuning, I would also recommend, to be
on the safe side, the following changes:

 - in uipc_mbuf.c, increase NCL_INIT to roughly 20. Seeing as how you're using
   if_ti which allocates its own buffer space, I don't suspect that you'll
   be needing many regular clusters. Check with `netstat -m' to see how many
   are typically in use during a test and increase NCL_INIT to that number. All
   this will do is pre-allocate the pool at boot time and avoid potentially
   expensive map allocations while your tests are running.

 - also in uipc_mbuf.c, increase NMB_INIT to 10240, or any sensible value you
   have detected is required (again, see with `netstat -m' during testing,
   try to determine the maximum number of mbufs you'll need). All this will
   do is, again, allocate them at boot time and speed up memory buffer
   allocations during performance testing. If you set the number to 10240,
   remember that each mbuf is merely 256 bytes, so you'll be giving up a mere
   ~2.7M for the cause, and speeding up allocations altogether.

	Finally, I noticed at one point in your analysis that you increased
NMBCLUSTERS. You'll find that, unless you're actually running out of mbufs
and/or clusters, that increasing N{MB,CL}_INIT is probably what you want to
do instead.

> 2. Why doesn't Fast Retransmit kick in?  (See the annotated sender's
>    view of the stalled connection.)
> 
> 3. Is there an off-by-one error in RST handling?  (See the end of the
>    annotated receiver's view of the stalled connection.)

	I believe jlemon covered these two issues in his post, which very
much makes sense as he's the overall stack guru. :-)

> -- 
> Stanislav Shalunov		http://www.internet2.edu/~shalunov/
> 
> "Nuclear war would really set back cable [television]."  -- Ted Turner

-- 
 Bosko Milekic
 bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message