Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 May 2010 06:23:57 -0700 (PDT)
From:      Barney Cordoba <barney_cordoba@yahoo.com>
To:        Jack Vogel <jfvogel@gmail.com>, Alexander Sack <pisymbol@gmail.com>
Cc:        Murat Balaban <murat@enderunix.org>, freebsd-net@freebsd.org, freebsd-performance@freebsd.org, Andrew Gallatin <gallatin@cs.duke.edu>
Subject:   Re: Intel 10Gb
Message-ID:  <620965.38211.qm@web63908.mail.re1.yahoo.com>
In-Reply-To: <AANLkTil-kmThBinyxxCRxNyHQKFbD0ndalN3STreRghC@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
=0A=0A--- On Fri, 5/14/10, Alexander Sack <pisymbol@gmail.com> wrote:=0A=0A=
> From: Alexander Sack <pisymbol@gmail.com>=0A> Subject: Re: Intel 10Gb=0A>=
 To: "Jack Vogel" <jfvogel@gmail.com>=0A> Cc: "Murat Balaban" <murat@enderu=
nix.org>, freebsd-net@freebsd.org, freebsd-performance@freebsd.org, "Andrew=
 Gallatin" <gallatin@cs.duke.edu>=0A> Date: Friday, May 14, 2010, 1:20 PM=
=0A> On Fri, May 14, 2010 at 1:01 PM, Jack=0A> Vogel <jfvogel@gmail.com>=0A=
> wrote:=0A> >=0A> >=0A> > On Fri, May 14, 2010 at 8:18 AM, Alexander Sack =
<pisymbol@gmail.com>=0A> wrote:=0A> >>=0A> >> On Fri, May 14, 2010 at 10:07=
 AM, Andrew Gallatin=0A> <gallatin@cs.duke.edu>=0A> >> wrote:=0A> >> > Alex=
ander Sack wrote:=0A> >> > <...>=0A> >> >>> Using this driver/firmware comb=
o, we=0A> can receive minimal packets at=0A> >> >>> line rate (14.8Mpps) to=
 userspace.=0A> =A0You can even access this using a=0A> >> >>> libpcap inte=
rface. =A0The trick is=0A> that the fast paths are OS-bypass,=0A> >> >>> an=
d don't suffer from OS overheads,=0A> like lock contention. =A0See=0A> >> >=
>> http://www.myri.com/scs/SNF/doc/index.html for=0A> details.=0A> >> >>=0A=
> >> >> But your timestamps will be atrocious at=0A> 10G speeds. =A0Myricom=
 doesn't=0A> >> >> timestamp packets AFAIK. =A0If you want=0A> reliable tim=
estamps you need to=0A> >> >> look at companies like Endace, Napatech,=0A> =
etc.=0A> >> >=0A> >> > I see your old help ticket in our system.=0A> =A0Yes=
, our timestamping=0A> >> > is not as good as a dedicated capture card=0A> =
with a GPS reference,=0A> >> > but it is good enough for most people.=0A> >=
>=0A> >> I was told btw that it doesn't timestamp at ALL.=0A> =A0I am assum=
ing NOW=0A> >> that is incorrect.=0A> >>=0A> >> Define *most* people.=0A> >=
>=0A> >> I am not knocking the Myricom card. =A0In fact I so=0A> wish you g=
uys would=0A> >> just add the ability to latch to a 1PPS for=0A> timestampi=
ng and it would=0A> >> be perfect.=0A> >>=0A> >> We use I think an older ve=
rsion of the card=0A> internally for replay.=0A> >> Its a great multi-purpo=
se card.=0A> >>=0A> >> However with IPG at 10G in the nanoseconds, anyone=
=0A> trying to do OWDs=0A> >> or RTT will find it difficult compared to an=
=0A> Endace or Napatech card.=0A> >>=0A> >> Btw, I was referring to bpf(4) =
specifically, so=0A> please don't take my=0A> >> comments as a knock agains=
t it.=0A> >>=0A> >> >> PS I am not sure but Intel also supports=0A> writing=
 packets directly in=0A> >> >> cache (yet I thought the 82599 driver=0A> ac=
tually does a prefetch anyway=0A> >> >> which had me confused on why that h=
elps)=0A> >> >=0A> >> > You're talking about DCA. =A0We support DCA as=0A> =
well (and I suspect some=0A> >> > other 10G NICs do to). =A0There are a few=
=0A> barriers to using DCA on=0A> >> > FreeBSD, not least of which is that =
FreeBSD=0A> doesn't currently have the=0A> >> > infrastructure to support i=
t (no IOATDMA or=0A> DCA drivers).=0A> >>=0A> >> Right.=0A> >>=0A> >> > DCA=
 is also problematic because support from=0A> system/motherboard=0A> >> > v=
endors is very spotty. =A0The vendor must=0A> provide the correct tag table=
=0A> >> > in BIOS such that the tags match the CPU/core=0A> numbering in th=
e system.=0A> >> > Many motherboard vendors don't bother with=0A> this, and=
 you cannot enable=0A> >> > DCA on a lot of systems, even though the=0A> un=
derlying chipset supports=0A> >> > DCA. =A0I've done hacks to force-enable =
it in=0A> the past, with mixed=0A> >> > results. The problem is that DCA de=
pends on=0A> having the correct tag=0A> >> > table, so that packets can be =
prefetched into=0A> the correct CPU's cache.=0A> >> > If the tag table is i=
ncorrect, DCA is a big=0A> pessimization, because it=0A> >> > blows the cac=
he in other CPUs.=0A> >>=0A> >> Right.=0A> >>=0A> >> > That said, I would *=
love* it if FreeBSD grew=0A> ioatdma/dca support.=0A> >> > Jack, does Intel=
 have any interest in porting=0A> DCA support to FreeBSD?=0A> >>=0A> >> Que=
stion for Jack or Drew, what DOES FreeBSD have=0A> to do to support=0A> >> =
DCA? =A0I thought DCA was something you just enable=0A> on the NIC chipset=
=0A> >> and if the system is IOATDMA aware, it just works.=0A> =A0Is that n=
ot right=0A> >> (assuming cache tags are correct and accessible)?=0A> =A0i.=
e. I thought this=0A> >> was hardware black magic than anything specific=0A=
> the OS has to do.=0A> >>=0A> >=0A> > OK, let me see if I can clarify some=
 of this. First,=0A> there IS an I/OAT=0A> > driver=0A> > that I did for Fr=
eeBSD like 3 or 4 years ago, in the=0A> timeframe that we put=0A> > the fea=
ture out. However, at that time all it was good=0A> for was the DMA=0A> > a=
spect=0A> > of things, and Prafulla used it to accelerate the=0A> stack cop=
ies; interest did=0A> > not seem that great so I put the code aside, its no=
t=0A> badly dated and needs=0A> > to be brought up to date due to there bei=
ng a few=0A> different versions of the=0A> > hardware now.=0A> >=0A> > At o=
ne point maybe a year back I started to take the=0A> code apart thinking=0A=
> > I would JUST do DCA, that got back-burnered due to=0A> other higher pri=
ority=0A> > issues, but its still an item in my queue.=0A> >=0A> > I also h=
ad a nibble of an interest in using the DMA=0A> engine so perhaps I=0A> > s=
hould not go down the road of just doing the DCA=0A> support in the I/OAT=
=0A> > part of the driver. The question is how to make the=0A> infrastructu=
re work.=0A> >=0A> > To answer Alexander's question, DCA support is NOT in=
=0A> the NIC, its in=0A> > the chipset, that's why the I/OAT driver was don=
e as a=0A> seperate driver,=0A> > but the NIC was the user of the info, its=
 been a while=0A> since I was into=0A> > the code but if memory serves the =
I/OAT driver just=0A> enables the support=0A> > in the chipset, and then th=
e NIC driver configures its=0A> engine to use it.=0A> =0A> Thank you very m=
uch Jack!=A0 :)=A0 It was not clear=0A> from the docs what was=0A> where to=
 me.=A0 I just assumed this was Intel NIC knew=0A> Intel chipset=0A> black =
magic!=A0 LOL.=0A> =0A> > DCA and DMA were supported in Linux in the same d=
river=0A> because=0A> > the chipset features were easily handled together=
=0A> perhaps, I'm not=0A> > sure :)=0A> =0A> Ok!=A0 (it was my other refere=
nce)=0A> =0A> > Fabien's data earlier in this thread suggested that a=0A> s=
trategicallly=0A> > placed prefetch did you more good than DCA did if I=0A>=
 recall, what=0A> > do you all think of that?=0A> =0A> I thought there was =
a thread where prefetch didn't do much=0A> for you....lol...=0A> =0A> If yo=
u just prefetch willy-nilly then don't you run the=0A> risk of=0A> packets =
hitting caches on cores outside of what the=0A> application=0A> reading the=
m is on thereby defeating the whole purpose of=0A> prefetch?=0A> =0A> > As =
far as I'm concerned right now I am willing to=0A> resurrect the driver,=0A=
> > clean it up and make the features available, we can=0A> see how valuabl=
e=0A> > they are after that, how does that sound??=0A> =0A> Sounds good to =
me.=A0 I at least put it somewhere=0A> publicly for people to look at.=0A> =
=0A> -aps=0A=0AOf course none of this has anything to do with the original =
subject.=0AProcessing a monodirectional stream is really no problem, nor do=
es=0Ait require any sort of special design consideration. All of this chatt=
er=0Aabout card features is largely minutia. =0A=0AModern processors are so=
 fast that its a waste of brain cells to spend=0Atime trying to squeeze non=
oseconds from packet gathering. You guys sound=0Athe same as when you were =
trying to do 10Mb/s ethernet with ISA bus NICs.=0A=0AIt makes no sense to f=
ocus on optimizing tires for a car which can't break=0A 80Mph. The entire p=
roblem is lock contention. Until you have a driver=0Athat can scale to a po=
int where 10gb/s is workable without significant=0Alock contention, you're =
just feeding a dead body.=0A=0AUnless of course your goal for 10gb/s for Fr=
eeBSD is for it to be a really=0Agood network monitor. =0A=0ABC =0A=0A=0A  =
    



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?620965.38211.qm>