From owner-freebsd-hackers@FreeBSD.ORG  Sat Jun 11 17:57:39 2011
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9DA61106566B;
	Sat, 11 Jun 2011 17:57:39 +0000 (UTC)
	(envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
	by mx1.freebsd.org (Postfix) with ESMTP id 639E68FC0A;
	Sat, 11 Jun 2011 17:57:39 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
	id 080FD7300A; Sat, 11 Jun 2011 20:13:53 +0200 (CEST)
Date: Sat, 11 Jun 2011 20:13:53 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Robert Watson <rwatson@FreeBSD.org>
Message-ID: <20110611181352.GA67777@onelab2.iet.unipi.it>
References: <BANLkTinuOS_yZYrqZ4cmU4cim+KFHNA=hQ@mail.gmail.com>
	<alpine.BSF.2.00.1106111645010.44950@fledge.watson.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.BSF.2.00.1106111645010.44950@fledge.watson.org>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-hackers@freebsd.org, grarpamp <grarpamp@gmail.com>,
	freebsd-net@freebsd.org
Subject: Re: FreeBSD I/OAT (QuickData now?) driver
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Jun 2011 17:57:39 -0000

On Sat, Jun 11, 2011 at 04:49:17PM +0100, Robert Watson wrote:
> 
> On Mon, 6 Jun 2011, grarpamp wrote:
> 
> >I know we've got polling. And probably MSI-X in a couple drivers. Pretty 
> >sure there is still one CPU doing the interrupt work? And none of the 
> >multiple queue thread spreading tech exists?
> 
> Actually, with most recent 10gbps cards, and even 1gbps cards, we process 
> inbound data with as many CPUs as the hardware has MSI-X enabled input and 
> output queues.  So "a couple" understates things significantly.
> 
> >   * Through PF_RING, expose the RX queues to the userland so that
> >the application can spawn one thread per queue hence avoid using
> >semaphores at all.
> 
> I'm probably a bit out of date, but last I checked, PF_RING still implied 
> copying, albeit into shared memory buffers.  We support shared memory 
> between the kernel and userspace for BPF and have done for quite a while.  
> However, right now a single shared memory buffer is shared for all receive 
> queues on a NIC.  We have a Google summer of code student working on this 
> actively right now -- my hope is that by the end of the summer we'll have a 
> pretty functional system that allows different shared memory buffers to be 
> used for different input queues.  In particular, applications will be able 
> to query the set of queues available, detect CPU affinity for them, and 
> bind particular shared memory rings to particular queues.  It's worth 
> observing that for many types of high-performance analysis, BPF's packet 
> filtering and truncation support is quite helpful, and if you're going to 
> use multiple hardware threads per input queue anyway, you actually get a 
> nice split this way (as long as those threads share L2 caches).
> 
> Luigi's work on mapping receive rings straight into userspace looks quite 
> interesting, but I'm pretty behind currently, so haven't had a chance to 
> read his NetMap paper.  The direct mapping of rings approach is what a 
> number of high-performance FreeBSD shops have been doing for a while, but 
> none had generalised it sufficiently to merge into our base stack.  I hope 
> to see this happen in the next year.

for the records, netmap also maps transmit rings, makes them device
independent, and supports the mapping of rings to different cores
through standard setaffinity() calls.

I'd really encourage people to look at the code (e.g. the pkt-gen.c
program, which is part of the archive) so you can see how easy it
is to use.

And of course, any feedback and suggestions are welcome

	http://info.iet.unipi.it/~luigi/netmap/

cheers
luigi