From owner-cvs-all@FreeBSD.ORG Thu Nov 20 16:16:53 2003 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A34716A4D0 for ; Thu, 20 Nov 2003 16:16:53 -0800 (PST) Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch [62.48.0.70]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4AB2443FCB for ; Thu, 20 Nov 2003 16:16:47 -0800 (PST) (envelope-from andre@freebsd.org) Received: (qmail 79852 invoked from network); 21 Nov 2003 00:19:50 -0000 Received: from unknown (HELO freebsd.org) ([62.48.0.53]) (envelope-sender ) by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP for ; 21 Nov 2003 00:19:50 -0000 Message-ID: <3FBD596E.CE7299BD@freebsd.org> Date: Fri, 21 Nov 2003 01:16:46 +0100 From: Andre Oppermann X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Nate Lawson References: <20031120200816.1F8C916A4DD@hub.freebsd.org> <20031120154119.M72721@root.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: cvs-src@FreeBSD.org cc: src-committers@FreeBSD.org cc: cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/conf files src/sys/net if_faith.c if_loop.croute.h rtsock.c src/sys/netinet in_pcb.c in_pcb.h in_rmx.c ip_divert.c ip_fw2.c ip_icmp.c ip_input.c ip_output.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2003 00:16:53 -0000 Nate Lawson wrote: > > On Thu, 20 Nov 2003, Andre Oppermann wrote: > > Modified files: > > sys/conf files > > sys/net if_faith.c if_loop.c route.h rtsock.c > > sys/netinet in_pcb.c in_pcb.h in_rmx.c ip_divert.c > > ip_fw2.c ip_icmp.c ip_input.c ip_output.c > > raw_ip.c tcp_input.c tcp_output.c > > tcp_subr.c tcp_syncache.c tcp_timer.c > > tcp_usrreq.c tcp_var.h udp_usrreq.c > > sys/netinet6 icmp6.c in6_pcb.c in6_rmx.c in6_src.c > > ip6_output.c udp6_output.c > > Added files: > > sys/netinet tcp_hostcache.c > > Log: > > Introduce tcp_hostcache and remove the tcp specific metrics from > > the routing table. Move all usage and references in the tcp stack > > from the routing table metrics to the tcp hostcache. > > > > It caches measured parameters of past tcp sessions to provide better > > initial start values for following connections from or to the same > > source or destination. Depending on the network parameters to/from > > the remote host this can lead to significant speedups for new tcp > > connections after the first one because they inherit and shortcut > > the learning curve. > > This is very good. There was no reason to throw away rtt estimates each > time a connection closed. Especially for http servers, this should make a We didn't exactly throw it away. It was stored in the rmx_... structure in the routing table. However it was only done after 16 srtt measurements have been made, which is almost never the case with http traffic. With the new code the srtt is saved after four samples have been made. Every new connection then smoothes that value more and more. The question to be answered is whether it makes sense to lower initial threshold even more, to two or three samples. That has to profiled. But in opinion the new value is a very good choice which actually gives a good stating value for the next connection. The old code would save a srtt only in about 5-7% of all connections. And then the hit rate for new connections using one of the cached values was around 1%. At the moment the hostcache is only updated when a tcp connection closes. I am studying (measure) whether it makes sense to update it after n fresh samples to make the cache even more effective. > big difference. Thanks so much! You're welcome! :-) > One great paper I read on this: > Prashant Pradhan, Tzi-Cker Chiueh, Anindya Neogi, Aggregate TCP Congestion > Control Using Multiple Network Probing, in proceedings of IEEE ICDCS'2000. > http://www.ecsl.cs.sunysb.edu/~prashant/papers/atcp.ps.gz I will certainly have a look at it. The next thing I'm working on is dynamically sized tcp socket buffers. This is one of the largest problems for good performace at the moment. Once you get a couple of ms rtt away from the server/client you quickly hit the bandwidth*delay product in the socket buffer. My new code is starting with a small buffer size of 8 or 16K and automatically grows that in step with the CWND and remote receive/send windows. The default maximum is probably somewhere around 512k but can be raised to 1M or more. For example I'm in Europe and around 170ms away from the FreeBSD cluster. When I'm uploading something I can't get more than ~190kbit/s speed because I hit the socket buffer limit, even though I've got 20Mbit/s unused direct SprintLink US transit and the Y! connectivity is even more than that. So only a socket buffer of 425K would allow me to fill and use the full 20meg pipe. BTW: Is there a reason why we haven't enabled rfc3390 and inflight by default? I'm running all my (ISP) servers with it and it gives quite a boost, especially with http traffic. The inflight stuff is also very good for connections where the remote side has only limited bandwidth. It doesn't overload the remote path buffer and keeps the traffic smooth instead of hitting the packet loss and trying again. -- Andre