From owner-freebsd-net@FreeBSD.ORG  Sat Jun 11 15:49:18 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9421E1065670;
	Sat, 11 Jun 2011 15:49:18 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 719338FC0A;
	Sat, 11 Jun 2011 15:49:18 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id A594246B46;
	Sat, 11 Jun 2011 11:49:17 -0400 (EDT)
Date: Sat, 11 Jun 2011 16:49:17 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: grarpamp <grarpamp@gmail.com>
In-Reply-To: <BANLkTinuOS_yZYrqZ4cmU4cim+KFHNA=hQ@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.1106111645010.44950@fledge.watson.org>
References: <BANLkTinuOS_yZYrqZ4cmU4cim+KFHNA=hQ@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-hackers@freebsd.org, freebsd-net@freebsd.org
Subject: Re: FreeBSD I/OAT (QuickData now?) driver
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Jun 2011 15:49:18 -0000


On Mon, 6 Jun 2011, grarpamp wrote:

> I know we've got polling. And probably MSI-X in a couple drivers. Pretty 
> sure there is still one CPU doing the interrupt work? And none of the 
> multiple queue thread spreading tech exists?

Actually, with most recent 10gbps cards, and even 1gbps cards, we process 
inbound data with as many CPUs as the hardware has MSI-X enabled input and 
output queues.  So "a couple" understates things significantly.

>    * Through PF_RING, expose the RX queues to the userland so that
> the application can spawn one thread per queue hence avoid using
> semaphores at all.

I'm probably a bit out of date, but last I checked, PF_RING still implied 
copying, albeit into shared memory buffers.  We support shared memory between 
the kernel and userspace for BPF and have done for quite a while.  However, 
right now a single shared memory buffer is shared for all receive queues on a 
NIC.  We have a Google summer of code student working on this actively right 
now -- my hope is that by the end of the summer we'll have a pretty functional 
system that allows different shared memory buffers to be used for different 
input queues.  In particular, applications will be able to query the set of 
queues available, detect CPU affinity for them, and bind particular shared 
memory rings to particular queues.  It's worth observing that for many types 
of high-performance analysis, BPF's packet filtering and truncation support is 
quite helpful, and if you're going to use multiple hardware threads per input 
queue anyway, you actually get a nice split this way (as long as those threads 
share L2 caches).

Luigi's work on mapping receive rings straight into userspace looks quite 
interesting, but I'm pretty behind currently, so haven't had a chance to read 
his NetMap paper.  The direct mapping of rings approach is what a number of 
high-performance FreeBSD shops have been doing for a while, but none had 
generalised it sufficiently to merge into our base stack.  I hope to see this 
happen in the next year.

Robert