From owner-freebsd-net@FreeBSD.ORG Sat Jun 11 15:49:18 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9421E1065670; Sat, 11 Jun 2011 15:49:18 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 719338FC0A; Sat, 11 Jun 2011 15:49:18 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id A594246B46; Sat, 11 Jun 2011 11:49:17 -0400 (EDT) Date: Sat, 11 Jun 2011 16:49:17 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: grarpamp In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, freebsd-net@freebsd.org Subject: Re: FreeBSD I/OAT (QuickData now?) driver X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jun 2011 15:49:18 -0000 On Mon, 6 Jun 2011, grarpamp wrote: > I know we've got polling. And probably MSI-X in a couple drivers. Pretty > sure there is still one CPU doing the interrupt work? And none of the > multiple queue thread spreading tech exists? Actually, with most recent 10gbps cards, and even 1gbps cards, we process inbound data with as many CPUs as the hardware has MSI-X enabled input and output queues. So "a couple" understates things significantly. > * Through PF_RING, expose the RX queues to the userland so that > the application can spawn one thread per queue hence avoid using > semaphores at all. I'm probably a bit out of date, but last I checked, PF_RING still implied copying, albeit into shared memory buffers. We support shared memory between the kernel and userspace for BPF and have done for quite a while. However, right now a single shared memory buffer is shared for all receive queues on a NIC. We have a Google summer of code student working on this actively right now -- my hope is that by the end of the summer we'll have a pretty functional system that allows different shared memory buffers to be used for different input queues. In particular, applications will be able to query the set of queues available, detect CPU affinity for them, and bind particular shared memory rings to particular queues. It's worth observing that for many types of high-performance analysis, BPF's packet filtering and truncation support is quite helpful, and if you're going to use multiple hardware threads per input queue anyway, you actually get a nice split this way (as long as those threads share L2 caches). Luigi's work on mapping receive rings straight into userspace looks quite interesting, but I'm pretty behind currently, so haven't had a chance to read his NetMap paper. The direct mapping of rings approach is what a number of high-performance FreeBSD shops have been doing for a while, but none had generalised it sufficiently to merge into our base stack. I hope to see this happen in the next year. Robert