From owner-freebsd-net@FreeBSD.ORG Sat May 15 21:49:34 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9666B1065673; Sat, 15 May 2010 21:49:34 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3A49D8FC14; Sat, 15 May 2010 21:49:33 +0000 (UTC) Received: by gwb15 with SMTP id 15so94344gwb.13 for ; Sat, 15 May 2010 14:49:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=yt36Eh76OCLtD9PEvOo1mTycfE1DLFZStVxG0/5SHnM=; b=S3W2MpT4fxRnq8sXpSvMDjLG70iVr9JZ1LAIYQaUIh6MUGtMdhjmu+MCoPgqxSnmqG LSazrIhA5QKYhVx/wIHblrkKqM4IgqE61+a56/fott1EvC9VNBiuS3F5KPR98MKdWUqc L9TMriksCWGZz5bvKCPi+f5ALCaSuF3Fbw53M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=eUu955Vh2EPpQSxV3f8C566U+9Wncvkw80R9hkqxUznvDI8uQVL1wQl+Eb3KWL6goX /sQgqKQzRgHbaWWK83GqNflfGZznNprnmmjDXWXLouEbe3/241Ft7yLyASKfOc2Dy5ug UVjcSaStFSQGREh75aKeTfS099NPjoCmbuebI= MIME-Version: 1.0 Received: by 10.100.246.35 with SMTP id t35mr3960057anh.14.1273960173419; Sat, 15 May 2010 14:49:33 -0700 (PDT) Received: by 10.100.58.2 with HTTP; Sat, 15 May 2010 14:49:33 -0700 (PDT) In-Reply-To: <620965.38211.qm@web63908.mail.re1.yahoo.com> References: <620965.38211.qm@web63908.mail.re1.yahoo.com> Date: Sat, 15 May 2010 17:49:33 -0400 Message-ID: From: Alexander Sack To: Barney Cordoba Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, Jack Vogel , Andrew Gallatin Subject: Re: Intel 10Gb X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 May 2010 21:49:34 -0000 On Sat, May 15, 2010 at 9:23 AM, Barney Cordoba wrote: > > > --- On Fri, 5/14/10, Alexander Sack wrote: > >> From: Alexander Sack >> Subject: Re: Intel 10Gb >> To: "Jack Vogel" >> Cc: "Murat Balaban" , freebsd-net@freebsd.org, free= bsd-performance@freebsd.org, "Andrew Gallatin" >> Date: Friday, May 14, 2010, 1:20 PM >> On Fri, May 14, 2010 at 1:01 PM, Jack >> Vogel >> wrote: >> > >> > >> > On Fri, May 14, 2010 at 8:18 AM, Alexander Sack >> wrote: >> >> >> >> On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin >> >> >> wrote: >> >> > Alexander Sack wrote: >> >> > <...> >> >> >>> Using this driver/firmware combo, we >> can receive minimal packets at >> >> >>> line rate (14.8Mpps) to userspace. >> =A0You can even access this using a >> >> >>> libpcap interface. =A0The trick is >> that the fast paths are OS-bypass, >> >> >>> and don't suffer from OS overheads, >> like lock contention. =A0See >> >> >>> http://www.myri.com/scs/SNF/doc/index.html for >> details. >> >> >> >> >> >> But your timestamps will be atrocious at >> 10G speeds. =A0Myricom doesn't >> >> >> timestamp packets AFAIK. =A0If you want >> reliable timestamps you need to >> >> >> look at companies like Endace, Napatech, >> etc. >> >> > >> >> > I see your old help ticket in our system. >> =A0Yes, our timestamping >> >> > is not as good as a dedicated capture card >> with a GPS reference, >> >> > but it is good enough for most people. >> >> >> >> I was told btw that it doesn't timestamp at ALL. >> =A0I am assuming NOW >> >> that is incorrect. >> >> >> >> Define *most* people. >> >> >> >> I am not knocking the Myricom card. =A0In fact I so >> wish you guys would >> >> just add the ability to latch to a 1PPS for >> timestamping and it would >> >> be perfect. >> >> >> >> We use I think an older version of the card >> internally for replay. >> >> Its a great multi-purpose card. >> >> >> >> However with IPG at 10G in the nanoseconds, anyone >> trying to do OWDs >> >> or RTT will find it difficult compared to an >> Endace or Napatech card. >> >> >> >> Btw, I was referring to bpf(4) specifically, so >> please don't take my >> >> comments as a knock against it. >> >> >> >> >> PS I am not sure but Intel also supports >> writing packets directly in >> >> >> cache (yet I thought the 82599 driver >> actually does a prefetch anyway >> >> >> which had me confused on why that helps) >> >> > >> >> > You're talking about DCA. =A0We support DCA as >> well (and I suspect some >> >> > other 10G NICs do to). =A0There are a few >> barriers to using DCA on >> >> > FreeBSD, not least of which is that FreeBSD >> doesn't currently have the >> >> > infrastructure to support it (no IOATDMA or >> DCA drivers). >> >> >> >> Right. >> >> >> >> > DCA is also problematic because support from >> system/motherboard >> >> > vendors is very spotty. =A0The vendor must >> provide the correct tag table >> >> > in BIOS such that the tags match the CPU/core >> numbering in the system. >> >> > Many motherboard vendors don't bother with >> this, and you cannot enable >> >> > DCA on a lot of systems, even though the >> underlying chipset supports >> >> > DCA. =A0I've done hacks to force-enable it in >> the past, with mixed >> >> > results. The problem is that DCA depends on >> having the correct tag >> >> > table, so that packets can be prefetched into >> the correct CPU's cache. >> >> > If the tag table is incorrect, DCA is a big >> pessimization, because it >> >> > blows the cache in other CPUs. >> >> >> >> Right. >> >> >> >> > That said, I would *love* it if FreeBSD grew >> ioatdma/dca support. >> >> > Jack, does Intel have any interest in porting >> DCA support to FreeBSD? >> >> >> >> Question for Jack or Drew, what DOES FreeBSD have >> to do to support >> >> DCA? =A0I thought DCA was something you just enable >> on the NIC chipset >> >> and if the system is IOATDMA aware, it just works. >> =A0Is that not right >> >> (assuming cache tags are correct and accessible)? >> =A0i.e. I thought this >> >> was hardware black magic than anything specific >> the OS has to do. >> >> >> > >> > OK, let me see if I can clarify some of this. First, >> there IS an I/OAT >> > driver >> > that I did for FreeBSD like 3 or 4 years ago, in the >> timeframe that we put >> > the feature out. However, at that time all it was good >> for was the DMA >> > aspect >> > of things, and Prafulla used it to accelerate the >> stack copies; interest did >> > not seem that great so I put the code aside, its not >> badly dated and needs >> > to be brought up to date due to there being a few >> different versions of the >> > hardware now. >> > >> > At one point maybe a year back I started to take the >> code apart thinking >> > I would JUST do DCA, that got back-burnered due to >> other higher priority >> > issues, but its still an item in my queue. >> > >> > I also had a nibble of an interest in using the DMA >> engine so perhaps I >> > should not go down the road of just doing the DCA >> support in the I/OAT >> > part of the driver. The question is how to make the >> infrastructure work. >> > >> > To answer Alexander's question, DCA support is NOT in >> the NIC, its in >> > the chipset, that's why the I/OAT driver was done as a >> seperate driver, >> > but the NIC was the user of the info, its been a while >> since I was into >> > the code but if memory serves the I/OAT driver just >> enables the support >> > in the chipset, and then the NIC driver configures its >> engine to use it. >> >> Thank you very much Jack!=A0 :)=A0 It was not clear >> from the docs what was >> where to me.=A0 I just assumed this was Intel NIC knew >> Intel chipset >> black magic!=A0 LOL. >> >> > DCA and DMA were supported in Linux in the same driver >> because >> > the chipset features were easily handled together >> perhaps, I'm not >> > sure :) >> >> Ok!=A0 (it was my other reference) >> >> > Fabien's data earlier in this thread suggested that a >> strategicallly >> > placed prefetch did you more good than DCA did if I >> recall, what >> > do you all think of that? >> >> I thought there was a thread where prefetch didn't do much >> for you....lol... >> >> If you just prefetch willy-nilly then don't you run the >> risk of >> packets hitting caches on cores outside of what the >> application >> reading them is on thereby defeating the whole purpose of >> prefetch? >> >> > As far as I'm concerned right now I am willing to >> resurrect the driver, >> > clean it up and make the features available, we can >> see how valuable >> > they are after that, how does that sound?? >> >> Sounds good to me.=A0 I at least put it somewhere >> publicly for people to look at. >> >> -aps > > Of course none of this has anything to do with the original subject. > Processing a monodirectional stream is really no problem, nor does > it require any sort of special design consideration. All of this chatter > about card features is largely minutia. > > Modern processors are so fast that its a waste of brain cells to spend > time trying to squeeze nonoseconds from packet gathering. You guys sound > the same as when you were trying to do 10Mb/s ethernet with ISA bus NICs. It depends on what you really mean and what lock contention you are specifically talking about. The NIC features as well as multi-queue bpf(4) is a way to distribute the load across multiple cores thereby lowering total CPU overhead (that's always good) AS WELL AS provide the ability for libpcap consumers to post-process caught packets in cache. Most third-party capture cards already do just this: they are typically stream or feed based and allow for flow based steering to distribute the load across cores. Intel only recently has added this in their 10g chipsets (Jack can correct if I'm wrong). All of these things help both in capture and post-processing. > It makes no sense to focus on optimizing tires for a car which can't brea= k > =A080Mph. The entire problem is lock contention. Until you have a driver > that can scale to a point where 10gb/s is workable without significant > lock contention, you're just feeding a dead body. Lock contention in bpf(4) or in the NIC driver or in both? :) > Unless of course your goal for 10gb/s for FreeBSD is for it to be a reall= y > good network monitor. That is exactly my goal: it would be great to see FreeBSD as a fantastic general-purpose network monitor at 10gb/s speeds. There are couple of issues one of which is also timestamping. -aps