Date: Fri, 14 May 2010 12:16:46 -0400 From: "Leonid Grossman" <Leonid.Grossman@exar.com> To: "Andrew Gallatin" <gallatin@cs.duke.edu>, "Alexander Sack" <pisymbol@gmail.com> Cc: Murat Balaban <murat@enderunix.org>, freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: RE: Intel 10Gb Message-ID: <78C9135A3D2ECE4B8162EBDCE82CAD77067570BE@nekter> In-Reply-To: <4BED6F1B.7070602@cs.duke.edu> References: <AANLkTimMrsM08Rmdr-l6RFu83VkqFw0Pk2sHxpV5Yl5x@mail.gmail.com> <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <AANLkTikROvNKUmpax-CbhEyj5o7TW0hfV_x79Bm_nU2V@mail.gmail.com> <4BED5929.5020302@cs.duke.edu><AANLkTikAow9ZdK4XokeWXkbmusva2rKxeLO2EBBe3tsZ@mail.gmail.com> <4BED6F1B.7070602@cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Neterion/Exar x3100 is one of generic 10GbE NICs that supports timestamping in hardware, along with some other packet capturing/monitoring featiures; here is a relevant paragraph from programming manual: "Receive Frame Timestamp Feature The x3100 has the ability to label each incoming frame with a timestamp to allow a host entity to record the arrival time of incoming packets. The host uses the XMAC_TIMESTAMP register to control its operation. To enable the feature, the "EN" field must be set. Once the timestamp feature is enabled, the FCS value of each frame will be replaced with the value in a free-running 32-bit counter with a default period of 3.2 ns. The "USE_LINK_ID" determines if the full 32 bits of the of the FCS are used for the timestamp, or if the most significant 2 bits are used to identify which port the frame came in on, and 30 bits are used for the timestamp. The "INTERVAL" field can be used to programmably change the period between several values: 3.2 ns (the default), 6.4 ns, 12.8 ns, 25.6 ns, 51.2 ns, 102.4 ns, and 204.8 ns. NOTE: To take advantage of this feature, "XMAC_CFG_PORTn.STRIP_FCS" must be set to 0 to pass the FCS to the host." > -----Original Message----- > From: owner-freebsd-performance@freebsd.org [mailto:owner-freebsd- > performance@freebsd.org] On Behalf Of Andrew Gallatin > Sent: Friday, May 14, 2010 8:41 AM > To: Alexander Sack > Cc: Murat Balaban; freebsd-net@freebsd.org; freebsd- > performance@freebsd.org > Subject: Re: Intel 10Gb >=20 > Alexander Sack wrote: > > On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin > <gallatin@cs.duke.edu> wrote: > >> Alexander Sack wrote: > >> <...> > >>>> Using this driver/firmware combo, we can receive minimal packets > at > >>>> line rate (14.8Mpps) to userspace. You can even access this > using a > >>>> libpcap interface. The trick is that the fast paths are OS- > bypass, > >>>> and don't suffer from OS overheads, like lock contention. See > >>>> http://www.myri.com/scs/SNF/doc/index.html for details. > >>> But your timestamps will be atrocious at 10G speeds. Myricom > doesn't > >>> timestamp packets AFAIK. If you want reliable timestamps you need > to > >>> look at companies like Endace, Napatech, etc. > >> I see your old help ticket in our system. Yes, our timestamping > >> is not as good as a dedicated capture card with a GPS reference, > >> but it is good enough for most people. > > > > I was told btw that it doesn't timestamp at ALL. I am assuming NOW > > that is incorrect. >=20 > I think you might have misunderstood how we do timestamping. > I definately don't understand it, and I work there ;) > I do know that there is NIC component of it (eg, it is not 100% > done in the host). I also realize that it is not is good as > something that is 1PPS GPS based. >=20 > > Define *most* people. >=20 > I may have a skewed view of the market, but it seems like > some people care deeply about accurate timestamps, and > others (mostly doing deep packet inspection) care only > within a few milliseconds, or even seconds. >=20 > > I am not knocking the Myricom card. In fact I so wish you guys > would > > just add the ability to latch to a 1PPS for timestamping and it > would > > be perfect. > > > > We use I think an older version of the card internally for replay. > > Its a great multi-purpose card. > > > > However with IPG at 10G in the nanoseconds, anyone trying to do OWDs > > or RTT will find it difficult compared to an Endace or Napatech > card. > > > > Btw, I was referring to bpf(4) specifically, so please don't take my > > comments as a knock against it. > > > >>> PS I am not sure but Intel also supports writing packets directly > in > >>> cache (yet I thought the 82599 driver actually does a prefetch > anyway > >>> which had me confused on why that helps) > >> You're talking about DCA. We support DCA as well (and I suspect > some > >> other 10G NICs do to). There are a few barriers to using DCA on > >> FreeBSD, not least of which is that FreeBSD doesn't currently have > the > >> infrastructure to support it (no IOATDMA or DCA drivers). > > > > Right. > > > >> DCA is also problematic because support from system/motherboard > >> vendors is very spotty. The vendor must provide the correct tag > table > >> in BIOS such that the tags match the CPU/core numbering in the > system. > >> Many motherboard vendors don't bother with this, and you cannot > enable > >> DCA on a lot of systems, even though the underlying chipset > supports > >> DCA. I've done hacks to force-enable it in the past, with mixed > >> results. The problem is that DCA depends on having the correct tag > >> table, so that packets can be prefetched into the correct CPU's > cache. > >> If the tag table is incorrect, DCA is a big pessimization, because > it > >> blows the cache in other CPUs. > > > > Right. > > > >> That said, I would *love* it if FreeBSD grew ioatdma/dca support. > >> Jack, does Intel have any interest in porting DCA support to > FreeBSD? > > > > Question for Jack or Drew, what DOES FreeBSD have to do to support > > DCA? I thought DCA was something you just enable on the NIC chipset > > and if the system is IOATDMA aware, it just works. Is that not > right > > (assuming cache tags are correct and accessible)? i.e. I thought > this > > was hardware black magic than anything specific the OS has to do. >=20 > IOATDMA and DCA are sort of unfairly joined for two reasons: The DCA > control stuff is implemented as part of the IOATDMA PCIe device, and > IOATDMA is a great usage model for DCA, since you'd want the DMAs > that it does to be prefetched. >=20 > To use DCA you need: >=20 > - A DCA driver to talk to the IOATDMA/DCA pcie device, and obtain the > tag > table > - An interface that a client device (eg, NIC driver) can use to obtain > either the tag table, or at least the correct tag for the CPU > that the interrupt handler is bound to. The basic support in > a NIC driver boils down to something like: >=20 > nic_interrupt_handler() > { > if (sc->dca.enabled && (curcpu !=3D sc->dca.last_cpu)) { > sc->dca.last_cpu =3D curcpu; > tag =3D dca_get_tag(curcpu); > WRITE_REG(sc, DCA_TAG, tag); > } > } >=20 > Drew > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance- > unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?78C9135A3D2ECE4B8162EBDCE82CAD77067570BE>