From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 7 21:07:48 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8469F16A407; Sat, 7 Oct 2006 21:07:48 +0000 (UTC) (envelope-from sbahra@kerneled.org) Received: from perseus.interservers.com (perseus.interservers.com [65.202.242.100]) by mx1.FreeBSD.org (Postfix) with ESMTP id E848043D46; Sat, 7 Oct 2006 21:07:47 +0000 (GMT) (envelope-from sbahra@kerneled.org) Received: from cpanel by perseus.interservers.com with local (Exim 4.52) id 1GWJOf-0000C3-8g; Sat, 07 Oct 2006 17:07:49 -0400 Received: from 161.253.23.83 ([161.253.23.83]) by www.kerneled.org (Horde MIME library) with HTTP; Sat, 07 Oct 2006 17:07:49 -0400 Message-ID: <20061007170749.yo2zt6f2hfsgsgwk@www.kerneled.org> Date: Sat, 07 Oct 2006 17:07:49 -0400 From: sbahra@kerneled.org To: Robert Watson References: <20061005151536.GA25283@devil.rrze.uni-erlangen.de> <20061006115301.T43229@fledge.watson.org> In-Reply-To: <20061006115301.T43229@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.1) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - perseus.interservers.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [32002 32002] / [47 12] X-AntiAbuse: Sender Address Domain - kerneled.org X-Source: X-Source-Args: X-Source-Dir: Cc: freebsd-hackers@freebsd.org, Jochen Kaiser , csjp@FreeBSD.org Subject: Re: libpcap perf improvement? latest ideas? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Oct 2006 21:07:48 -0000 > On Thu, 5 Oct 2006, Jochen Kaiser wrote: [...] >> after reading a german master thesis [1] (dated 12/2004) about pcap =20 >> performance (with comparison of linux and freebsd) I searched =20 >> freebsd resources for pcap improvements. Unofortunately I did not =20 >> find any improvements like PF_RING and/or efforts for reducing the =20 >> number of copy operations from device to user space. Hi Jochen, I suggest you take a look at http://www.kerneled.org/?p=3D15 What is implemented here is reasonably light-weight but still uses a =20 ring buffer model. I agree that a reference model can be used to =20 reduce the number of copies done currently for BPF. I personally =20 haven't seen work there and I have yet the time unfortunately to =20 shut-up and just code such a thing. >> Maybe I think too simple because I don't know how SMP fine locks =20 >> are influencing this (maybe it is very complex to improve that when =20 >> you want to avoid side effects.). Basically, the reason for bad performance here in Linux in the =20 PF_PACKET sense is high system call overhead. BPF does collective =20 buffering, so system call over-head is less. Applications doing packet =20 analysis will also find a lot of required information in the BPF =20 header as well (such as timestamps). In PF_PACKET you are forced to do =20 a system call per-acquisition and another system for receiving the =20 time-stamp of the last packet read for example. Other research papers have also shown that BPF still out-weights =20 PF_PACKET and other zero-copy PF_* solutions with small packets as =20 well. The kerneled.org post helps to explain this in a general sense. =20 Basically, zero-copy isn't the end-it-all solution but for larger =20 packets it makes great sense. The application I've been developing for =20 a while now makes use of both depending on average packet size. On Thu, 6 Oct 2006, Robert Watson wrote: > Quite a bit of work has been done on zero-copy for BPF, but none of it > really commitable. Christian Peron (CC'd) and I have been talking > about doing something that is commitable, but some of the details (such > as memory ownership) are still very much up in the air. Is there any archive or summary online noting the approaches you guys =20 would like to take with BPF? I am really interested in this, and when =20 my time permits would love to write some code. I am also interested in =20 the other work you noted that is not committable. Could we be provided =20 some URLs please? :-) > PF_RING takes an interesting approach, and one we should look at, but we'd > also like to keep all the benefits of BPF rather than discard them, =20 > so need to > consider how best to apply elements of the approach in our context. > I'd like to see something like this happen for FreeBSD 7.0, with a > possible backport if it goes really well. :-) The problem with PF_RING is it's static ring buffer model and concept =20 of marking slots. This doesn't make it so feasible for real-world =20 applications. I proposed a model that allows for dynamic ring buffer =20 size and signaling for soft and hard-limits to allow application =20 buffering to handle potential drops (or request specific allocation =20 strategies from the kernel). A Professor at my university also had an =20 idea of providing packet acquisition prioritization (push specific =20 packets to the beginning of the packet copy queue, which isn't =20 currently a concept in BPF) as a BPF language extension. An mmap =20 interface to BPF will be cool but unless we also provide stubs to =20 automize packet management from user-space it could be doomed to be =20 like PF_RING. :-P [...] Regards. -- Samy Al Bahra