From owner-freebsd-arch@FreeBSD.ORG  Sat Aug 20 21:37:07 2011
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8388E106564A;
	Sat, 20 Aug 2011 21:37:07 +0000 (UTC)
	(envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
	by mx1.freebsd.org (Postfix) with ESMTP id 2B84F8FC0A;
	Sat, 20 Aug 2011 21:37:06 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
	id 1A0217300A; Sat, 20 Aug 2011 23:55:03 +0200 (CEST)
Date: Sat, 20 Aug 2011 23:55:03 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Lev Serebryakov <lev@freebsd.org>
Message-ID: <20110820215503.GA45984@onelab2.iet.unipi.it>
References: <slrnj4oiiq.21rg.vadim_nuclight@kernblitz.nuclight.avtf.net>
	<810527321.20110819123700@serebryakov.spb.ru>
	<201108191401.23083.pieter@degoeje.nl>
	<425884435.20110819175307@serebryakov.spb.ru>
	<20110819172252.GE88904@in-addr.com>
	<368496955.20110820101506@serebryakov.spb.ru>
	<alpine.BSF.2.00.1108201234280.4529@fledge.watson.org>
	<20110820134530.GA42942@onelab2.iet.unipi.it>
	<1361908410.20110821011005@serebryakov.spb.ru>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1361908410.20110821011005@serebryakov.spb.ru>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-arch@freebsd.org
Subject: Re: 10gbps scalability (was: Re: FreeBSD problems and preliminary
	ways to solve)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 21:37:07 -0000

On Sun, Aug 21, 2011 at 01:10:05AM +0400, Lev Serebryakov wrote:
> Hello, Luigi.
> You wrote 20 ??????? 2011 ?., 17:45:30:
> 
> > - the Click modular router now runs (in userspace) at up to 4Mpps
> >   per core, which is faster than in-kernel linux;
> > A userspace version of ipfw should be available in a short time,
> > and i have some work in progress to bring the forwarding tables
> > in userspace (but of course you can do the same with Click).
> > I also see people start using it, which is a good thing because
> > i am getting useful feedback on features and bugs and patches
> > for more device drivers.
> [SKIPPED]
> > On the general issue of improving performance of the network stack,
> > I feel that to achieve significant speed improvements we should
> > really reconsider the way things are done in the network stack. 
> > And that comes before support for special HW features. 
>  Could you please explain (I don't mean, that you are wrong, I really
> don't understand), how netmap and other user-level processing could
> help for ROUTING (with firewalling, different routes, etc) and
> software switching? I understand very well, why this help user-level

i am working on the following now:
- routing daemons and the like still work as usual, adding and
  modifying routes with the standard mechanisms (routing sockets etc.)
- the kernel updates its own forwarding tables (FIB) as usual

But:
- a netmap client (userspace) listens for FIB updates on a
  routing socket, and builds its own copy of the FIB in userspace
  (call it uFIB)
- the same process sets interfaces in netmap mode, and uses the
  uFIB to do forwarding, injecting back into the kernel those
  packets that have a local destination.

> applications, which need to process huge PPS rates. Less memcpy, less
> allocations, less context switches (and TLB/cache flushes) -- all
> these things is very clear to me. But why user-level software
> swithcing is faster than in-kernel one, hwcih should wotk without
> memory context switches AT ALL?!

essentially, the driver in netmap mode is way more efficient and
this offsets the cost of the few syscalls.
As an example, currently with netmap one core can forward
packets between interfaces at a rate between 3 and 10 Mpps depending
on the amount of processing on the packet, and there are significant
optimizations that are still possible especially at the lower speeds
(if 3 Mpps can be called so)

>   Or netmap is used for prototyping code, which will be moved into
> kernel later?

Nothing prevents, of course, that kernel subsystems use the interface
directly in netmap mode. But i think that now that we have the option,
it makes sense to spend some time to experiment with newer solutions
(FIB data structures, firewalls, memory aligment,
possibly even tcp buffer management) in userspace and then move
stuff back into the kernel once we have a good solution.

i am using it for prototyping and testing subsystems in userspace,
whether it makes sense to move them depends on the performance we
manage to get.

cheers
luigi