From owner-freebsd-performance@FreeBSD.ORG Fri May 14 14:01:17 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46EE6106564A; Fri, 14 May 2010 14:01:17 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 270868FC14; Fri, 14 May 2010 14:01:09 +0000 (UTC) Received: by gwj16 with SMTP id 16so1525377gwj.13 for ; Fri, 14 May 2010 07:01:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=0yvqTnCrJxEGBPHRrjRmWnC7w5/rzJyxOyM4xZjTS5g=; b=ORbeAAhCWsklLBcMkpUNRMVkvpOF/U+evl/7ZQxjddAf6p76QZWpcspGvMex/VGFqE /F072nqiT1rBE/XQhBlxoMEFrZNp3LosqXJhBWbbd9aGoHp65kiGGbpWpfReQfH/oLwj K7J9hhhuYrv3S2sNYLBRwD6A438cg9ovxaMqU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=hI7c9yqdVm2WBvhGpDYhqWPCIz7Syo/XhmD7OsOBgao8wxUKqv8ZA0XW/QchHmUGal HIQG+mfhJoIfwamSgW3uJYMtIBWyhln6zP/pSjDsE9dcaxT1L8j3jBlAOpGe4HMVnfYR tWZ74aRZKIYsk1Jcag6kOGdm05eP9ZBNpGEr4= MIME-Version: 1.0 Received: by 10.101.196.30 with SMTP id y30mr1087611anp.251.1273843928706; Fri, 14 May 2010 06:32:08 -0700 (PDT) Received: by 10.100.58.2 with HTTP; Fri, 14 May 2010 06:32:08 -0700 (PDT) In-Reply-To: <20100511135103.GA29403@grapeape2.cs.duke.edu> References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> Date: Fri, 14 May 2010 09:32:08 -0400 Message-ID: From: Alexander Sack To: Andrew Gallatin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Fri, 14 May 2010 16:56:50 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 14:01:17 -0000 On Tue, May 11, 2010 at 9:51 AM, Andrew Gallatin wro= te: > Murat Balaban [murat@enderunix.org] wrote: >> >> Much of the FreeBSD networking stack has been made parallel in order to >> cope with high packet rates at 10 Gig/sec operation. >> >> I've seen good numbers (near 10 Gig) in my tests involving TCP/UDP >> send/receive. (latest Intel driver). >> >> As far as BPF is concerned, above statement does not hold true, >> since there is some work that needs to be done here in terms >> of BPF locking and parallelism. My tests show that there >> is a high lock contention around "bpf interface lock", resulting >> in input errors at high packet rates and with many bpf devices. > > If you're interested in 10GbE packet sniffing at line rate on the > cheap, have a look at the Myri10GE "sniffer" interface. =A0This is a > special software package that takes a normal mxge(4) NIC, and replaces > the driver/firmware with a "myri_snf" driver/firmware which is > optimized for packet sniffing. > > Using this driver/firmware combo, we can receive minimal packets at > line rate (14.8Mpps) to userspace. =A0You can even access this using a > libpcap interface. =A0The trick is that the fast paths are OS-bypass, > and don't suffer from OS overheads, like lock contention. =A0See > http://www.myri.com/scs/SNF/doc/index.html for details. But your timestamps will be atrocious at 10G speeds. Myricom doesn't timestamp packets AFAIK. If you want reliable timestamps you need to look at companies like Endace, Napatech, etc. We do a lot of packet capture and work on bpf(4) all the time. My biggest concern for reliable 10G packet capture is timestamps. The call to microtime up in catchpacket() is not going to cut it (it barely cuts it for GIGE line rate speeds). I'd be interested in doing the multi-queue bpf(4) myself (perhaps I should ask? I don't know if non-summer-of-code folks are allowed?). I believe the goal is not so much throughput but cache affinity. It would be nice if say the listener application (libpcap) could bind itself to the same core that the driver's queue is receiving packets on so everything from catching to post-processing all work with a very warm cache (theoretically). I think that's the idea. It would also allow multiple applications to subscribe to potentially different queues that are doing some form of load balancing. Again, Intel's 82599 chipset supports flow based queues (albeit the size of the flow table is limited). Note, zero-copy bpf(4) is your friend in all use cases at 10G speeds! :) -aps PS I am not sure but Intel also supports writing packets directly in cache (yet I thought the 82599 driver actually does a prefetch anyway which had me confused on why that helps)