From owner-cvs-all@FreeBSD.ORG  Thu Nov 20 16:16:53 2003
Return-Path: <owner-cvs-all@FreeBSD.ORG>
Delivered-To: cvs-all@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2A34716A4D0
	for <cvs-all@FreeBSD.org>; Thu, 20 Nov 2003 16:16:53 -0800 (PST)
Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch
	[62.48.0.70])	by mx1.FreeBSD.org (Postfix) with ESMTP id 4AB2443FCB
	for <cvs-all@FreeBSD.org>; Thu, 20 Nov 2003 16:16:47 -0800 (PST)
	(envelope-from andre@freebsd.org)
Received: (qmail 79852 invoked from network); 21 Nov 2003 00:19:50 -0000
Received: from unknown (HELO freebsd.org) ([62.48.0.53])
          (envelope-sender <andre@freebsd.org>)
          by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP
          for <nate@root.org>; 21 Nov 2003 00:19:50 -0000
Message-ID: <3FBD596E.CE7299BD@freebsd.org>
Date: Fri, 21 Nov 2003 01:16:46 +0100
From: Andre Oppermann <andre@freebsd.org>
X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Nate Lawson <nate@root.org>
References: <20031120200816.1F8C916A4DD@hub.freebsd.org>
	<20031120154119.M72721@root.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: cvs-src@FreeBSD.org
cc: src-committers@FreeBSD.org
cc: cvs-all@FreeBSD.org
Subject: Re: cvs commit: src/sys/conf files src/sys/net if_faith.c 
 if_loop.croute.h rtsock.c src/sys/netinet in_pcb.c in_pcb.h 
 in_rmx.c       ip_divert.c ip_fw2.c ip_icmp.c ip_input.c ip_output.c
X-BeenThere: cvs-all@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: CVS commit messages for the entire tree <cvs-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-all>
List-Post: <mailto:cvs-all@freebsd.org>
List-Help: <mailto:cvs-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Nov 2003 00:16:53 -0000

Nate Lawson wrote:
> 
> On Thu, 20 Nov 2003, Andre Oppermann wrote:
> >   Modified files:
> >     sys/conf             files
> >     sys/net              if_faith.c if_loop.c route.h rtsock.c
> >     sys/netinet          in_pcb.c in_pcb.h in_rmx.c ip_divert.c
> >                          ip_fw2.c ip_icmp.c ip_input.c ip_output.c
> >                          raw_ip.c tcp_input.c tcp_output.c
> >                          tcp_subr.c tcp_syncache.c tcp_timer.c
> >                          tcp_usrreq.c tcp_var.h udp_usrreq.c
> >     sys/netinet6         icmp6.c in6_pcb.c in6_rmx.c in6_src.c
> >                          ip6_output.c udp6_output.c
> >   Added files:
> >     sys/netinet          tcp_hostcache.c
> >   Log:
> >   Introduce tcp_hostcache and remove the tcp specific metrics from
> >   the routing table.  Move all usage and references in the tcp stack
> >   from the routing table metrics to the tcp hostcache.
> >
> >   It caches measured parameters of past tcp sessions to provide better
> >   initial start values for following connections from or to the same
> >   source or destination.  Depending on the network parameters to/from
> >   the remote host this can lead to significant speedups for new tcp
> >   connections after the first one because they inherit and shortcut
> >   the learning curve.
> 
> This is very good.  There was no reason to throw away rtt estimates each
> time a connection closed.  Especially for http servers, this should make a

We didn't exactly throw it away.  It was stored in the rmx_... structure
in the routing table.  However it was only done after 16 srtt measurements
have been made, which is almost never the case with http traffic.  With
the new code the srtt is saved after four samples have been made.  Every
new connection then smoothes that value more and more.  The question to
be answered is whether it makes sense to lower initial threshold even
more, to two or three samples.  That has to profiled.  But in opinion
the new value is a very good choice which actually gives a good stating
value for the next connection.  The old code would save a srtt only in
about 5-7% of all connections.  And then the hit rate for new connections
using one of the cached values was around 1%.

At the moment the hostcache is only updated when a tcp connection closes.
I am studying (measure) whether it makes sense to update it after n fresh
samples to make the cache even more effective.

> big difference.  Thanks so much!

You're welcome! :-)

> One great paper I read on this:
> Prashant Pradhan, Tzi-Cker Chiueh, Anindya Neogi, Aggregate TCP Congestion
> Control Using Multiple Network Probing, in proceedings of IEEE ICDCS'2000.
> http://www.ecsl.cs.sunysb.edu/~prashant/papers/atcp.ps.gz

I will certainly have a look at it.  The next thing I'm working on is
dynamically sized tcp socket buffers.  This is one of the largest problems
for good performace at the moment.  Once you get a couple of ms rtt away
from the server/client you quickly hit the bandwidth*delay product in
the socket buffer.  My new code is starting with a small buffer size of
8 or 16K and automatically grows that in step with the CWND and remote
receive/send windows.  The default maximum is probably somewhere around
512k but can be raised to 1M or more.  For example I'm in Europe and
around 170ms away from the FreeBSD cluster.  When I'm uploading something
I can't get more than ~190kbit/s speed because I hit the socket buffer
limit, even though I've got 20Mbit/s unused direct SprintLink US transit
and the Y! connectivity is even more than that.  So only a socket buffer
of 425K would allow me to fill and use the full 20meg pipe.

BTW: Is there a reason why we haven't enabled rfc3390 and inflight by
default?  I'm running all my (ISP) servers with it and it gives quite
a boost, especially with http traffic.  The inflight stuff is also very
good for connections where the remote side has only limited bandwidth.
It doesn't overload the remote path buffer and keeps the traffic smooth
instead of hitting the packet loss and trying again.

-- 
Andre