From owner-freebsd-net@FreeBSD.ORG Wed Feb 1 10:23:19 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EFCF916A422 for ; Wed, 1 Feb 2006 10:23:18 +0000 (GMT) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id EFF8E43D55 for ; Wed, 1 Feb 2006 10:23:16 +0000 (GMT) (envelope-from andre@freebsd.org) Received: (qmail 78007 invoked from network); 1 Feb 2006 10:22:08 -0000 Received: from dotat.atdotat.at (HELO [62.48.0.47]) ([62.48.0.47]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 1 Feb 2006 10:22:08 -0000 Message-ID: <43E08C13.3090904@freebsd.org> Date: Wed, 01 Feb 2006 11:23:15 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8b) Gecko/20050217 MIME-Version: 1.0 To: Greg 'groggy' Lehey References: <20060201012011.GP97116@wantadilla.lemis.com> In-Reply-To: <20060201012011.GP97116@wantadilla.lemis.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, hackers@freebsd.org Subject: Re: Van Jacobson's network stack restructure X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Feb 2006 10:23:19 -0000 Greg 'groggy' Lehey wrote: > Last week, at the Linux.conf.au in Dunedin, Van Jacobson presented > some slides about work he has been doing rearchitecting the Linux > network stack. He claims to have reduced the CPU usage by 80% and > doubled network throughput (he expects more, but it was limited by > memory bandwidth). The approach looks like it would work on FreeBSD > as well. I spoke to him and he confirmed. > > He's currently trying to get the code released as open source, but in > the meantime his slides are up on > http://www.lemis.com/grog/Documentation/vj/. Yes, this is my web > site. The conference organizers are going to put it up on their web > site soon, but in the meantime he's asked me to put it were I can. > > Comments? It's an interesting approach. However there are a few caveats which put its probable overall performance on par or again with the traditional sockets approach. In his model the buffer (window) resides within user space and is shared with the kernel. This is very losely related to our zero-copy page flipping socket buffer. However this doesn't solve the problem of socket buffer memory overcommit. In fact with his model memory actually in use at any given point in time may be a lot more than the always fully committed socket buffer (in userland share with kernel) plus a number of outstanding packets waiting in the socket queue. The shared user/kernel socket buffer should not be paged out and thus must stay resident. With a large numbers of connections on a machine this gets inefficient because all buffer memory is always committed and not just when it is needed. Memory overcommit goes away. Processing the TCP segments on the same CPU as the userland resides (provided it doesn't migrate [too often]) is certainly beneficial and something we are looking at for some time already. However we are not there yet and have still some work on the TCP stack to do for this to become a reality. Processing the TCP segments within the process CPU quantum and only when it gets selected by the scheduler is a very interesting idea. It has a couple of true advantages and theoretical disadvantages. On the good side it accounts the work in the TCP stack to the process, aggregates processing all segments that arrived between process runs and keeps good cpu/cache locality. On the potential negative side it increases segment latency and has to maintain not only the socket buffer but also another unprocessed-packet buffer. The packet buffer has to be limited or we open ourselfs up to memory exhaustion attacks. When many packets for a connection arrive and the process doesn't get scheduled quickly enough we may get packet loss because packet queue overflows. This can be dealt with in relatively good ways though. Summary: Some gems in there and we are certainly looking at a couple of those ideas to adapt to our network stack in the future. -- Andre