Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Jun 2015 19:34:49 +0200
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: netmap custom RSS and custom packet info
Message-ID:  <CA%2BhQ2%2BiYyRH8rUWbmnxCTO1KEx7dF%2BCjMgt9FyUVhitMpMF73A@mail.gmail.com>
In-Reply-To: <20150629162213.GG1647@zxy.spb.ru>
References:  <20150629151750.GD1647@zxy.spb.ru> <CA%2BhQ2%2BjhNkhLnxHQKeoEgbs2479hdnLd7mRR3XPmQLZyS1=1sw@mail.gmail.com> <20150629162213.GG1647@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 29, 2015 at 6:22 PM, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:

> On Mon, Jun 29, 2015 at 06:05:41PM +0200, Luigi Rizzo wrote:
>
> > On Mon, Jun 29, 2015 at 5:17 PM, Slawa Olhovchenkov <slw@zxy.spb.ru>
> wrote:
> >
> > > Working with netmap and modern hardware I am lacking some features:
> > >
> > > a) some spare space before packet (64/128/192/256 bytes) for
> > > application data. For example: application do some pre-analysig
> > > packet, filled structure in this space and routed packet (via NETMAP
> > > pipe) to other thread. Received thread got packet and linked
> > > inforamtion about this packet for processing w/o additional overhead.
> > >
> >
> > =E2=80=8Bspare space in front of the packet is something we have
> > been considering for a different purpose, namely better
> > support for encapsulation/decapsulation and things like
> > vhost-net header.
>
> Adding more space (sysctl or ioctl controled may be satisfy both:
> 4-8-20 bytes for encapsulation and rest for application).
>
> > =E2=80=8BNote though that the annotation is transferred for free
> > only in the case of pipes or ports sharing the same memory
> > region; vale ports would have to explicitly copy the
> > extra=E2=80=8B bytes which is (moderately) expensive.
>
> I think this bytes don't be transfered throw VALE.
> This is only packet-processing information, like tags, opposite to
> VALE that is like packet transfered by wire.
> =E2=80=8B
>


> > A quick and dirty way to support what you want is the following:
> > - in the kernel code, modify NMB(), PNMB() and the offset between
> >   the netmap_ring and the first buffer to add the extra space
> >   you want in front of the packet. You can possibly make this
> >   offset a sysctl-controlled value
> >
> > - in netmap_vale.c, make a small change to the code that copies
> >   buffers so that it includes also the space before the actual packet.
> >
> > That should be all.
>
> Do you plan to do this?
> I am don't like have permanenty private branch/patchs.
>

=E2=80=8Bpossibly in the long term yes, but before doing it
i want to design it properly so that it does not
look like a custom hack.


> > > b) custom RSS. Modern NIC have RSS poorly interoperable with packet
> > > analysing: packets from same flow, but different direction placed in
> > > different queue, PPPoE encapsulated packets placed in queue 0,
> > > different tunneling don't recognised and etc. May be NETMAP can be
> > > used custom RSS hashing from loadable kernel module, provideng by
> > > user? Function frm this module can be packet analysing, tunnel
> > > removing, custom RSS hashnig with direction-independly maner, filled
> > > some structure prepended to buffer (see above) and pass this
> > > information to application.
> > >
> >
> > =E2=80=8BRSS is completely orthogonal to=E2=80=8B
> >
> > =E2=80=8B netmap and I strongly
> > suggest to keep it this way, using either use the NIC-specific
> > tools to control RSS or some generic mechanism
> > (on linux there is ethtool, and we should implement something
> > similar also on freebsd).
>
> This is not true RSS. This is only trick for reassigning RX packets to
> different netmap rings. All hardware avalable RSS mechanism is fully
> inacceptable for this:
>
> - don't support different encapsulation (PPPoE, GRE, GTP and etc)
> - give different rings for packet 1.2.3.4->5.6.7.8 and  5.6.7.8->1.2.3.4
>
> Producing unversal hashing/distributing mechanism is too complex. But
> using user-providing kernel module (syncing to application) may be
> acceptable?
>

> This is like ephemeral permanent NETMAP pipe between real hardware
> RX rings/driver and application visible rings.
>


this particular function
=E2=80=8Bwould also need to deal with
notifications between the physical NIC and the exported
netmap rings, and i would probably leave it to userspace.

You should be able to do what you
have in mind using the
programmable forwarding function =E2=80=8B
=E2=80=8B
that already exists
for VALE ports
=E2=80=8B (at the cost of a memory copy, which could
be avoided when/if we decide to support VALE ports that
share the same memory region hence using zero copy.=E2=80=8B

Don't hold your breath though.

cheers
luigi=E2=80=8B



--=20
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2217533               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BhQ2%2BiYyRH8rUWbmnxCTO1KEx7dF%2BCjMgt9FyUVhitMpMF73A>