Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 May 2015 11:21:40 -0500
From:      Jim Thompson <jim@netgate.com>
To:        Barney Cordoba <barney_cordoba@yahoo.com>
Cc:        Luigi Rizzo <rizzo@iet.unipi.it>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: netmap-ipfw on em0 em1
Message-ID:  <CDE844AB-1F64-4922-AA45-D6710C6BD99E@netgate.com>
In-Reply-To: <1009610346.1107538.1430753353703.JavaMail.yahoo@mail.yahoo.com>
References:  <CA%2BhQ2%2Bj7wPca%2Bf_h67u8aHN1BGdVDLvUeRGSXahbNooY6qc9kA@mail.gmail.com> <1009610346.1107538.1430753353703.JavaMail.yahoo@mail.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help

While it is a true statement that, "You can do anything in the kernel =
that you can do in user space.=E2=80=9D, it is not a helpful statement.  =
Yes, the kernel is just a program.
In a similar way, =E2=80=9CYou can just pop it into any kernel and it =
works.=E2=80=9D is also not helpful.  It works, but it doesn=E2=80=99t =
work well, because of other infrastructure issues.
Both of your statements reduce to the age-old, =E2=80=9Cproof is left as =
an exercise for the student=E2=80=9D.

There is a lot of kernel infrastructure that is just plain crusty(*) and =
which directly impedes performance in this area.

But there is plenty of cruft, Barney.  Here are two threads which are =
three years old, with the issues it points out still unresolved, and =
multiple places where 100ns or more is lost:
=
https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033287.html=

=
https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033351.html=


100ns is death at 10Gbps with min-sized packets.

quoting: http://luca.ntop.org/10g.pdf
---
Taking as a reference a 10 Gbit/s link, the raw throughput is well below =
the memory bandwidth of modern systems (between 6 and 8 GBytes/s for CPU =
to memory, up to 5 GBytes/s on PCI-Express x16). How- ever a 10Gbit/s =
link can generate up to 14.88 million Packets Per Second (pps), which =
means that the system must be able to process one packet every 67.2 ns. =
This translates to about 200 clock cycles even for the faster CPUs, and =
might be a challenge considering the per- packet overheads normally =
involved by general-purpose operating systems. The use of large frames =
reduces the pps rate by a factor of 20..50, which is great on end hosts =
only concerned in bulk data transfer.  Monitoring systems and traffic =
generators, however, must be able to deal with worst case conditions.=E2=80=
=9D

Forwarding and filtering must also be able to deal with worst case, and =
nobody does well with kernel-based networking here.  =
https://github.com/gvnn3/netperf/blob/master/Documentation/Papers/ABSDCon2=
015Paper.pdf

10Gbps NICs are $200-$300 today, and they=E2=80=99ll be included on the =
motherboard during the next hardware refresh.  Broadwell-DE (Xeon-D) has =
10G in the SoC, and others are coming.
10Gbps switches can be had at around $100/port.  This is exactly the =
point at which the adoption curve for 1Gbps Ethernet ramped over a =
decade ago.


(*) A few more simple examples of cruft:

Why, in 2015 does the kernel have a =E2=80=98fast forwarding=E2=80=99 =
option, and worse, one that isn=E2=80=99t enabled by default?  =
Shouldn=E2=80=99t =E2=80=9Cfast forwarding" be the default?

Why, in 2015, does FreeBSD not ship with IPSEC enabled in GENERIC?  =
(Reason: each and every time this has come up in recent memory, someone =
has pointed out that it impacts performance.  =
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D128030)

Why, in 2015, does anyone think it=E2=80=99s acceptable for =E2=80=9Cfast =
forwarding=E2=80=9D to break IPSEC?

Why, in 2015, does anyone think it=E2=80=99s acceptable that the =
setkey(8) man page documents, of all things, DES-CBC and HMAC-MD5 for a =
SA?   That=E2=80=99s some kind of sick joke, right?
This completely flies in the face of RFC 4835.


> On May 4, 2015, at 10:29 AM, Barney Cordoba via freebsd-net =
<freebsd-net@freebsd.org> wrote:
>=20
> It's not faster than "wedging" into the if_input()s. It simply can't =
be. Your getting packets at interrupt time as soon as their processed =
and  you there's no network stack involved, and your able to receive and =
transmit without a process switch. At worst it's the same, without the =
extra plumbing. It's not rocket science to "bypass the network stack".
> The only advantage of bringing it into user space would be that it's =
easier to write threaded handlers for complex uses; but not as a =
firewall (which is the limit of the context of my comment). You can do =
anything in the kernel that you can do in user space. The reason a =
kernel module with if_input() hooks is better is that you can use the =
standard kernel without all of the netmap hacks. You can just pop it =
into any kernel and it works.
> BC=20
>=20
>=20
>     On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo <rizzo@iet.unipi.it> =
wrote:
>=20
>=20
> On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net <
> freebsd-net@freebsd.org> wrote:
>=20
>> Frankly I'm baffled by netmap. You can easily write a loadable kernel
>> module that moves packets from 1 interface to another and hook in the
>> firewall; why would you want to bring them up into user space? It's =
1000s
>> of lines of unnecessary code.
>>=20
>>=20
> Because it is much faster.
>=20
> The motivation for netmap-like
> solutions (that includes Intel's DPDK, PF_RING/DNA
> and several proprietary implementations) is speed:
> they bypass the entire network stack, and a
> good part of the device drivers, so you can access
> packets=20
>=20
> 10+ times faster.
> So things are actually the other way around:
> the 1000's of unnecessary
> lines of code
> (not really thousands, though)
> are
> those that you'd pay going through the standard
> network stack
> when you
> don't need any of its services.
>=20
> Going to userspace is just a side effect -- turns out to
> be easier to develop and run your packet processing code
> in userspace, but there are netmap clients (e.g. the
> VALE software switch) which run entirely in the kernel.
>=20
> cheers
> luigi
>=20
>=20
>=20
>>=20
>>=20
>>       On Sunday, May 3, 2015 3:10 AM, Raimundo Santos =
<raitech@gmail.com>
>> wrote:
>>=20
>>=20
>>   Clarifying things for the sake of documentation:
>>=20
>> To use the host stack, append a ^ character after the name of the =
interface
>> you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 =
RELEASE.)
>>=20
>> Examples:
>>=20
>> "kipfw em0" does nothing useful.
>> "kipfw netmap:em0" disconnects the NIC from the usual data path, =
i.e.,
>> there are no host communications.
>> "kipfw netmap:em0 netmap:em0^" or "kipfw netmap:em0+" places the
>> netmap-ipfw rules between the NIC and the host stack entry point =
associated
>> (the IP addresses configured on it with ifconfig, ARP and RARP, =
etc...)
>> with the same NIC.
>>=20
>> On 10 November 2014 at 18:29, Evandro Nunes =
<evandronunes12@gmail.com>
>> wrote:
>>=20
>>> dear professor luigi,
>>> i have some numbers, I am filtering 773Kpps with kipfw using 60% of =
CPU
>> and
>>> system using the rest, this system is a 8core at 2.4Ghz, but only =
one
>> core
>>> is in use
>>> in this next round of tests, my NIC is now an avoton with igb(4) =
driver,
>>> currently with 4 queues per NIC (total 8 queues for kipfw bridge)
>>> i have read in your papers we should expect something similar to =
1.48Mpps
>>> how can I benefit from the other CPUs which are completely idle? I =
tried
>>> CPU Affinity (cpuset) kipfw but system CPU usage follows userland =
kipfw
>> so
>>> I could not set one CPU to userland while other for system
>>>=20
>>=20
>> All the papers talk about *generating* lots of packets, not =
*processing*
>> lots of packets. What this netmap example does is processing. If =
someone
>> really wants to use the host stack, the expected performance WILL BE =
worse
>> - what's the point of using a host stack bypassing tool/framework if
>> someone will end up using the host stack?
>>=20
>> And by generating, usually the papers means: minimum sized UDP =
packets.
>>=20
>>=20
>>>=20
>>> can you please enlighten?
>>>=20
>>=20
>> For everyone: read the manuals, read related and indicated materials
>> (papers, web sites, etc), and, as a least resource, read the code. =
Within
>> netmap's codes, it's more easy than it sounds.
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org"
>>=20
>>=20
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org"
>>=20
>=20
>=20
>=20
> --=20
> =
-----------------------------------------+-------------------------------
> Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. =
dell'Informazione
> http://www.iet.unipi.it/~luigi/       . Universita` di Pisa
> TEL      +39-050-2217533              . via Diotisalvi 2
> Mobile  +39-338-6809875              . 56122 PISA (Italy)
> =
-----------------------------------------+-------------------------------
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>=20
>=20
>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CDE844AB-1F64-4922-AA45-D6710C6BD99E>