From owner-freebsd-net@FreeBSD.ORG Fri May 14 16:13:43 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA1D1106566C; Fri, 14 May 2010 16:13:43 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-yx0-f185.google.com (mail-yx0-f185.google.com [209.85.210.185]) by mx1.freebsd.org (Postfix) with ESMTP id 4C21A8FC17; Fri, 14 May 2010 16:13:42 +0000 (UTC) Received: by yxe15 with SMTP id 15so1154511yxe.7 for ; Fri, 14 May 2010 09:13:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=lfOS74UNxkCZAgwh/zVgcRQF2ud/eUL9uJjW4s62ips=; b=TRTgwISCY1g4g+AgFho4CGCNLMpbtqw5fAPF2YX00wlQNV9VRE4JTD1xb56lZZhFyM FA2+i5gqmUJd2um5lSVt0JKYVLNf91oGCwZzsE5P1sOCNpx+yZLh78AQ+6y9SBi2Lfnn pNEEeJxE6zTOl8Ca7cCFKvrLrC0APcSHTC37g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=gI5SbI53GxyAn4NWrpDGmEu+r6nY/NDJ16oywzzVZ+AWtQ7ABjCYoBX+hKSlzKO5R4 O+gJ3W1BHMG/FQEHozSsfdnhyBwIzXmpSC+cGzpvnsdRtcU+XcFZFKz3fLm6R0Gpndov Db+0gral/0ub5ycsMxMyNHF8e0jzhzirLUtBo= MIME-Version: 1.0 Received: by 10.101.203.9 with SMTP id f9mr1519158anq.208.1273853622431; Fri, 14 May 2010 09:13:42 -0700 (PDT) Received: by 10.100.58.2 with HTTP; Fri, 14 May 2010 09:13:42 -0700 (PDT) In-Reply-To: <4BED6F1B.7070602@cs.duke.edu> References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <4BED5929.5020302@cs.duke.edu> <4BED6F1B.7070602@cs.duke.edu> Date: Fri, 14 May 2010 12:13:42 -0400 Message-ID: From: Alexander Sack To: Andrew Gallatin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 16:13:43 -0000 On Fri, May 14, 2010 at 11:41 AM, Andrew Gallatin wr= ote: > Alexander Sack wrote: >> On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin >> wrote: >>> Alexander Sack wrote: >>> <...> >>>>> Using this driver/firmware combo, we can receive minimal packets at >>>>> line rate (14.8Mpps) to userspace. =A0You can even access this using = a >>>>> libpcap interface. =A0The trick is that the fast paths are OS-bypass, >>>>> and don't suffer from OS overheads, like lock contention. =A0See >>>>> http://www.myri.com/scs/SNF/doc/index.html for details. >>>> But your timestamps will be atrocious at 10G speeds. =A0Myricom doesn'= t >>>> timestamp packets AFAIK. =A0If you want reliable timestamps you need t= o >>>> look at companies like Endace, Napatech, etc. >>> I see your old help ticket in our system. =A0Yes, our timestamping >>> is not as good as a dedicated capture card with a GPS reference, >>> but it is good enough for most people. >> >> I was told btw that it doesn't timestamp at ALL. =A0I am assuming NOW >> that is incorrect. > > I think you might have misunderstood how we do timestamping. > I definately don't understand it, and I work there ;) No problem. :) > I do know that there is NIC component of it (eg, it is not 100% > done in the host). =A0I also realize that it is not is good as > something that is 1PPS GPS based. I need to grab your docs and start reading it again. I would like to support data capture using the Myricom card. I somehow missed this. I had thought the timestamps were software generated only. > >> Define *most* people. > > I may have a skewed view of the market, but it seems like > some people care deeply about accurate timestamps, and > others (mostly doing deep packet inspection) care only > within a few milliseconds, or even seconds. In our case Andrew, the folks who are doing deep packet inspection REQUIRE reasonable time stamps to correlate events and do generate reasonable stats. But I hear you, if you are just looking to see the packet data, then timestamp accuracy isn't your top priority. >> Question for Jack or Drew, what DOES FreeBSD have to do to support >> DCA? =A0I thought DCA was something you just enable on the NIC chipset >> and if the system is IOATDMA aware, it just works. =A0Is that not right >> (assuming cache tags are correct and accessible)? =A0i.e. I thought this >> was hardware black magic than anything specific the OS has to do. > > IOATDMA and DCA are sort of unfairly joined for two reasons: The DCA > control stuff is implemented as part of the IOATDMA PCIe device, and > IOATDMA is a great usage model for DCA, since you'd want the DMAs > that it does to be prefetched. > > To use DCA you need: > > - A DCA driver to talk to the IOATDMA/DCA pcie device, and obtain the tag > =A0 =A0 =A0 =A0table > - An interface that a client device (eg, NIC driver) can use to obtain > =A0 =A0 =A0 =A0either the tag table, or at least the correct tag for the = CPU > =A0 =A0 =A0 =A0that the interrupt handler is bound to. =A0The basic suppo= rt in > =A0 =A0 =A0 =A0a NIC driver boils down to something like: > > nic_interrupt_handler() > { > =A0if (sc->dca.enabled && (curcpu !=3D sc->dca.last_cpu)) { > =A0 =A0 sc->dca.last_cpu =3D curcpu; > =A0 =A0 tag =3D dca_get_tag(curcpu); > =A0 =A0 WRITE_REG(sc, DCA_TAG, tag); > =A0} > } Drew, at least in the Intel documentation, it seems the NIC uses the LAPIC id to tell the PCIe TLPs where to put inbound NIC I/O (in the TLP the DCA info is stored) to the appropriate core's cache. i.e. the heuristic you gave above is more granular than what I think Intel does. I could be wrong, maybe Jack can chime in and correct me. But it seems with Intel chipsets it is a per queue parameter which allows you to bind a core cache's to a queue via DCA. The added piece to this for at least bpf(4) consumers is to have bpf(4) subscribe to these queues AND to allow an interface for libpcap applications to know where what queue is on what core and THEN bind to it. I think that is the general idea....I think! :) -aps