From owner-freebsd-performance@FreeBSD.ORG Sat May 15 13:23:59 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 942D41065673 for ; Sat, 15 May 2010 13:23:58 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63908.mail.re1.yahoo.com (web63908.mail.re1.yahoo.com [69.147.97.123]) by mx1.freebsd.org (Postfix) with SMTP id 516B78FC2A for ; Sat, 15 May 2010 13:23:58 +0000 (UTC) Received: (qmail 38237 invoked by uid 60001); 15 May 2010 13:23:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1273929837; bh=YTqbwfT9mYmWVQTgyP3CKB2UfD756kj+hl140axZ8I0=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=KdhbsdYDgT5bHCR/Kf7Kyuf2Zy2dWBgC/YczY4KiLdmM1h+jYgUKUFTTuhpjZErBOHG2ZmewRXjbWrTVhBcYq/kXHKi5UZTJUFg04cLd5tOBVEjJqr9IlhWpeWT9Pq8AvtoLNg/kK1x/gr0WiQHbfmkrgrPzw8hS4Hw37IaGcbI= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Rm94iVUy/4mAI3wrKIqd2aTJkRppd0e5AgmMn8WHbdC2PlU6e7WfVkPsOQZHB45vWURlxxG5KuBQGL2LZ4WCoXM04xqXW0TP2u7ujuEXe2k4KeL/dwkTpd6RyCxzwWTVUS856b/f/bnntvCZ889bhDHAOJD8miexUccbgEolB14=; Message-ID: <620965.38211.qm@web63908.mail.re1.yahoo.com> X-YMail-OSG: E7zg2pIVM1kiA2F3RPEaNz7m.6wonVexcWJEV0UcUCsqda3 ryb9NokrA.1pw_91v8d4bACZUPqYc.kynihgL0A4Pg8bBRtx3LJNpKhArLZS _M5D2T29HvF93Dnm1nmgL7.mDY0OwPHmKImNwu0BRaEdGbRuUZh9tlZqzbf2 DUm1JpAUPFfbYVE2fTYdLdG.0CjZkXBhRLdkQO7w5eIB0u8.tzO9G6ylTT3_ YbDnrATp_K3he9d.B2GRpk_82.URPyZ3NrgleOwF8F9WPXw1riTTlzbzhMCQ .VZhAaBQeHQgfsyM1IjhNSQCFd0dNRUs- Received: from [98.203.21.152] by web63908.mail.re1.yahoo.com via HTTP; Sat, 15 May 2010 06:23:57 PDT X-Mailer: YahooMailClassic/10.1.11 YahooMailWebService/0.8.103.269680 Date: Sat, 15 May 2010 06:23:57 -0700 (PDT) From: Barney Cordoba To: Jack Vogel , Alexander Sack In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Sat, 15 May 2010 14:12:49 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, Andrew Gallatin Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 May 2010 13:23:59 -0000 =0A=0A--- On Fri, 5/14/10, Alexander Sack wrote:=0A=0A= > From: Alexander Sack =0A> Subject: Re: Intel 10Gb=0A>= To: "Jack Vogel" =0A> Cc: "Murat Balaban" , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, "Andrew= Gallatin" =0A> Date: Friday, May 14, 2010, 1:20 PM= =0A> On Fri, May 14, 2010 at 1:01 PM, Jack=0A> Vogel =0A= > wrote:=0A> >=0A> >=0A> > On Fri, May 14, 2010 at 8:18 AM, Alexander Sack = =0A> wrote:=0A> >>=0A> >> On Fri, May 14, 2010 at 10:07= AM, Andrew Gallatin=0A> =0A> >> wrote:=0A> >> > Alex= ander Sack wrote:=0A> >> > <...>=0A> >> >>> Using this driver/firmware comb= o, we=0A> can receive minimal packets at=0A> >> >>> line rate (14.8Mpps) to= userspace.=0A> =A0You can even access this using a=0A> >> >>> libpcap inte= rface. =A0The trick is=0A> that the fast paths are OS-bypass,=0A> >> >>> an= d don't suffer from OS overheads,=0A> like lock contention. =A0See=0A> >> >= >> http://www.myri.com/scs/SNF/doc/index.html for=0A> details.=0A> >> >>=0A= > >> >> But your timestamps will be atrocious at=0A> 10G speeds. =A0Myricom= doesn't=0A> >> >> timestamp packets AFAIK. =A0If you want=0A> reliable tim= estamps you need to=0A> >> >> look at companies like Endace, Napatech,=0A> = etc.=0A> >> >=0A> >> > I see your old help ticket in our system.=0A> =A0Yes= , our timestamping=0A> >> > is not as good as a dedicated capture card=0A> = with a GPS reference,=0A> >> > but it is good enough for most people.=0A> >= >=0A> >> I was told btw that it doesn't timestamp at ALL.=0A> =A0I am assum= ing NOW=0A> >> that is incorrect.=0A> >>=0A> >> Define *most* people.=0A> >= >=0A> >> I am not knocking the Myricom card. =A0In fact I so=0A> wish you g= uys would=0A> >> just add the ability to latch to a 1PPS for=0A> timestampi= ng and it would=0A> >> be perfect.=0A> >>=0A> >> We use I think an older ve= rsion of the card=0A> internally for replay.=0A> >> Its a great multi-purpo= se card.=0A> >>=0A> >> However with IPG at 10G in the nanoseconds, anyone= =0A> trying to do OWDs=0A> >> or RTT will find it difficult compared to an= =0A> Endace or Napatech card.=0A> >>=0A> >> Btw, I was referring to bpf(4) = specifically, so=0A> please don't take my=0A> >> comments as a knock agains= t it.=0A> >>=0A> >> >> PS I am not sure but Intel also supports=0A> writing= packets directly in=0A> >> >> cache (yet I thought the 82599 driver=0A> ac= tually does a prefetch anyway=0A> >> >> which had me confused on why that h= elps)=0A> >> >=0A> >> > You're talking about DCA. =A0We support DCA as=0A> = well (and I suspect some=0A> >> > other 10G NICs do to). =A0There are a few= =0A> barriers to using DCA on=0A> >> > FreeBSD, not least of which is that = FreeBSD=0A> doesn't currently have the=0A> >> > infrastructure to support i= t (no IOATDMA or=0A> DCA drivers).=0A> >>=0A> >> Right.=0A> >>=0A> >> > DCA= is also problematic because support from=0A> system/motherboard=0A> >> > v= endors is very spotty. =A0The vendor must=0A> provide the correct tag table= =0A> >> > in BIOS such that the tags match the CPU/core=0A> numbering in th= e system.=0A> >> > Many motherboard vendors don't bother with=0A> this, and= you cannot enable=0A> >> > DCA on a lot of systems, even though the=0A> un= derlying chipset supports=0A> >> > DCA. =A0I've done hacks to force-enable = it in=0A> the past, with mixed=0A> >> > results. The problem is that DCA de= pends on=0A> having the correct tag=0A> >> > table, so that packets can be = prefetched into=0A> the correct CPU's cache.=0A> >> > If the tag table is i= ncorrect, DCA is a big=0A> pessimization, because it=0A> >> > blows the cac= he in other CPUs.=0A> >>=0A> >> Right.=0A> >>=0A> >> > That said, I would *= love* it if FreeBSD grew=0A> ioatdma/dca support.=0A> >> > Jack, does Intel= have any interest in porting=0A> DCA support to FreeBSD?=0A> >>=0A> >> Que= stion for Jack or Drew, what DOES FreeBSD have=0A> to do to support=0A> >> = DCA? =A0I thought DCA was something you just enable=0A> on the NIC chipset= =0A> >> and if the system is IOATDMA aware, it just works.=0A> =A0Is that n= ot right=0A> >> (assuming cache tags are correct and accessible)?=0A> =A0i.= e. I thought this=0A> >> was hardware black magic than anything specific=0A= > the OS has to do.=0A> >>=0A> >=0A> > OK, let me see if I can clarify some= of this. First,=0A> there IS an I/OAT=0A> > driver=0A> > that I did for Fr= eeBSD like 3 or 4 years ago, in the=0A> timeframe that we put=0A> > the fea= ture out. However, at that time all it was good=0A> for was the DMA=0A> > a= spect=0A> > of things, and Prafulla used it to accelerate the=0A> stack cop= ies; interest did=0A> > not seem that great so I put the code aside, its no= t=0A> badly dated and needs=0A> > to be brought up to date due to there bei= ng a few=0A> different versions of the=0A> > hardware now.=0A> >=0A> > At o= ne point maybe a year back I started to take the=0A> code apart thinking=0A= > > I would JUST do DCA, that got back-burnered due to=0A> other higher pri= ority=0A> > issues, but its still an item in my queue.=0A> >=0A> > I also h= ad a nibble of an interest in using the DMA=0A> engine so perhaps I=0A> > s= hould not go down the road of just doing the DCA=0A> support in the I/OAT= =0A> > part of the driver. The question is how to make the=0A> infrastructu= re work.=0A> >=0A> > To answer Alexander's question, DCA support is NOT in= =0A> the NIC, its in=0A> > the chipset, that's why the I/OAT driver was don= e as a=0A> seperate driver,=0A> > but the NIC was the user of the info, its= been a while=0A> since I was into=0A> > the code but if memory serves the = I/OAT driver just=0A> enables the support=0A> > in the chipset, and then th= e NIC driver configures its=0A> engine to use it.=0A> =0A> Thank you very m= uch Jack!=A0 :)=A0 It was not clear=0A> from the docs what was=0A> where to= me.=A0 I just assumed this was Intel NIC knew=0A> Intel chipset=0A> black = magic!=A0 LOL.=0A> =0A> > DCA and DMA were supported in Linux in the same d= river=0A> because=0A> > the chipset features were easily handled together= =0A> perhaps, I'm not=0A> > sure :)=0A> =0A> Ok!=A0 (it was my other refere= nce)=0A> =0A> > Fabien's data earlier in this thread suggested that a=0A> s= trategicallly=0A> > placed prefetch did you more good than DCA did if I=0A>= recall, what=0A> > do you all think of that?=0A> =0A> I thought there was = a thread where prefetch didn't do much=0A> for you....lol...=0A> =0A> If yo= u just prefetch willy-nilly then don't you run the=0A> risk of=0A> packets = hitting caches on cores outside of what the=0A> application=0A> reading the= m is on thereby defeating the whole purpose of=0A> prefetch?=0A> =0A> > As = far as I'm concerned right now I am willing to=0A> resurrect the driver,=0A= > > clean it up and make the features available, we can=0A> see how valuabl= e=0A> > they are after that, how does that sound??=0A> =0A> Sounds good to = me.=A0 I at least put it somewhere=0A> publicly for people to look at.=0A> = =0A> -aps=0A=0AOf course none of this has anything to do with the original = subject.=0AProcessing a monodirectional stream is really no problem, nor do= es=0Ait require any sort of special design consideration. All of this chatt= er=0Aabout card features is largely minutia. =0A=0AModern processors are so= fast that its a waste of brain cells to spend=0Atime trying to squeeze non= oseconds from packet gathering. You guys sound=0Athe same as when you were = trying to do 10Mb/s ethernet with ISA bus NICs.=0A=0AIt makes no sense to f= ocus on optimizing tires for a car which can't break=0A 80Mph. The entire p= roblem is lock contention. Until you have a driver=0Athat can scale to a po= int where 10gb/s is workable without significant=0Alock contention, you're = just feeding a dead body.=0A=0AUnless of course your goal for 10gb/s for Fr= eeBSD is for it to be a really=0Agood network monitor. =0A=0ABC =0A=0A=0A =