From owner-freebsd-net@FreeBSD.ORG  Wed Feb  1 10:23:19 2006
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EFCF916A422
	for <freebsd-net@freebsd.org>; Wed,  1 Feb 2006 10:23:18 +0000 (GMT)
	(envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id EFF8E43D55
	for <freebsd-net@freebsd.org>; Wed,  1 Feb 2006 10:23:16 +0000 (GMT)
	(envelope-from andre@freebsd.org)
Received: (qmail 78007 invoked from network); 1 Feb 2006 10:22:08 -0000
Received: from dotat.atdotat.at (HELO [62.48.0.47]) ([62.48.0.47])
	(envelope-sender <andre@freebsd.org>)
	by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
	for <grog@FreeBSD.org>; 1 Feb 2006 10:22:08 -0000
Message-ID: <43E08C13.3090904@freebsd.org>
Date: Wed, 01 Feb 2006 11:23:15 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.8b) Gecko/20050217
MIME-Version: 1.0
To: Greg 'groggy' Lehey <grog@FreeBSD.org>
References: <20060201012011.GP97116@wantadilla.lemis.com>
In-Reply-To: <20060201012011.GP97116@wantadilla.lemis.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-net@freebsd.org, hackers@freebsd.org
Subject: Re: Van Jacobson's network stack restructure
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Feb 2006 10:23:19 -0000

Greg 'groggy' Lehey wrote:
> Last week, at the Linux.conf.au in Dunedin, Van Jacobson presented
> some slides about work he has been doing rearchitecting the Linux
> network stack.  He claims to have reduced the CPU usage by 80% and
> doubled network throughput (he expects more, but it was limited by
> memory bandwidth).  The approach looks like it would work on FreeBSD
> as well.  I spoke to him and he confirmed.
> 
> He's currently trying to get the code released as open source, but in
> the meantime his slides are up on
> http://www.lemis.com/grog/Documentation/vj/.  Yes, this is my web
> site.  The conference organizers are going to put it up on their web
> site soon, but in the meantime he's asked me to put it were I can.
> 
> Comments?

It's an interesting approach.  However there are a few caveats which
put its probable overall performance on par or again with the traditional
sockets approach.

In his model the buffer (window) resides within user space and is shared
with the kernel.  This is very losely related to our zero-copy page flipping
socket buffer.  However this doesn't solve the problem of socket buffer
memory overcommit.  In fact with his model memory actually in use at any
given point in time may be a lot more than the always fully committed socket
buffer (in userland share with kernel) plus a number of outstanding packets
waiting in the socket queue.  The shared user/kernel socket buffer should
not be paged out and thus must stay resident.  With a large numbers of
connections on a machine this gets inefficient because all buffer memory
is always committed and not just when it is needed.  Memory overcommit
goes away.

Processing the TCP segments on the same CPU as the userland resides (provided
it doesn't migrate [too often]) is certainly beneficial and something we are
looking at for some time already.  However we are not there yet and have still
some work on the TCP stack to do for this to become a reality.

Processing the TCP segments within the process CPU quantum and only when it
gets selected by the scheduler is a very interesting idea.  It has a couple
of true advantages and theoretical disadvantages.  On the good side it accounts
the work in the TCP stack to the process, aggregates processing all segments
that arrived between process runs and keeps good cpu/cache locality.  On the
potential negative side it increases segment latency and has to maintain not
only the socket buffer but also another unprocessed-packet buffer.  The packet
buffer has to be limited or we open ourselfs up to memory exhaustion attacks.
When many packets for a connection arrive and the process doesn't get scheduled
quickly enough we may get packet loss because packet queue overflows.  This can
be dealt with in relatively good ways though.

Summary: Some gems in there and we are certainly looking at a couple of those
ideas to adapt to our network stack in the future.

-- 
Andre