Date: Mon, 29 Jun 2015 19:34:49 +0200 From: Luigi Rizzo <rizzo@iet.unipi.it> To: Slawa Olhovchenkov <slw@zxy.spb.ru> Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: netmap custom RSS and custom packet info Message-ID: <CA%2BhQ2%2BiYyRH8rUWbmnxCTO1KEx7dF%2BCjMgt9FyUVhitMpMF73A@mail.gmail.com> In-Reply-To: <20150629162213.GG1647@zxy.spb.ru> References: <20150629151750.GD1647@zxy.spb.ru> <CA%2BhQ2%2BjhNkhLnxHQKeoEgbs2479hdnLd7mRR3XPmQLZyS1=1sw@mail.gmail.com> <20150629162213.GG1647@zxy.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 29, 2015 at 6:22 PM, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote: > On Mon, Jun 29, 2015 at 06:05:41PM +0200, Luigi Rizzo wrote: > > > On Mon, Jun 29, 2015 at 5:17 PM, Slawa Olhovchenkov <slw@zxy.spb.ru> > wrote: > > > > > Working with netmap and modern hardware I am lacking some features: > > > > > > a) some spare space before packet (64/128/192/256 bytes) for > > > application data. For example: application do some pre-analysig > > > packet, filled structure in this space and routed packet (via NETMAP > > > pipe) to other thread. Received thread got packet and linked > > > inforamtion about this packet for processing w/o additional overhead. > > > > > > > =E2=80=8Bspare space in front of the packet is something we have > > been considering for a different purpose, namely better > > support for encapsulation/decapsulation and things like > > vhost-net header. > > Adding more space (sysctl or ioctl controled may be satisfy both: > 4-8-20 bytes for encapsulation and rest for application). > > > =E2=80=8BNote though that the annotation is transferred for free > > only in the case of pipes or ports sharing the same memory > > region; vale ports would have to explicitly copy the > > extra=E2=80=8B bytes which is (moderately) expensive. > > I think this bytes don't be transfered throw VALE. > This is only packet-processing information, like tags, opposite to > VALE that is like packet transfered by wire. > =E2=80=8B > > > A quick and dirty way to support what you want is the following: > > - in the kernel code, modify NMB(), PNMB() and the offset between > > the netmap_ring and the first buffer to add the extra space > > you want in front of the packet. You can possibly make this > > offset a sysctl-controlled value > > > > - in netmap_vale.c, make a small change to the code that copies > > buffers so that it includes also the space before the actual packet. > > > > That should be all. > > Do you plan to do this? > I am don't like have permanenty private branch/patchs. > =E2=80=8Bpossibly in the long term yes, but before doing it i want to design it properly so that it does not look like a custom hack. > > > b) custom RSS. Modern NIC have RSS poorly interoperable with packet > > > analysing: packets from same flow, but different direction placed in > > > different queue, PPPoE encapsulated packets placed in queue 0, > > > different tunneling don't recognised and etc. May be NETMAP can be > > > used custom RSS hashing from loadable kernel module, provideng by > > > user? Function frm this module can be packet analysing, tunnel > > > removing, custom RSS hashnig with direction-independly maner, filled > > > some structure prepended to buffer (see above) and pass this > > > information to application. > > > > > > > =E2=80=8BRSS is completely orthogonal to=E2=80=8B > > > > =E2=80=8B netmap and I strongly > > suggest to keep it this way, using either use the NIC-specific > > tools to control RSS or some generic mechanism > > (on linux there is ethtool, and we should implement something > > similar also on freebsd). > > This is not true RSS. This is only trick for reassigning RX packets to > different netmap rings. All hardware avalable RSS mechanism is fully > inacceptable for this: > > - don't support different encapsulation (PPPoE, GRE, GTP and etc) > - give different rings for packet 1.2.3.4->5.6.7.8 and 5.6.7.8->1.2.3.4 > > Producing unversal hashing/distributing mechanism is too complex. But > using user-providing kernel module (syncing to application) may be > acceptable? > > This is like ephemeral permanent NETMAP pipe between real hardware > RX rings/driver and application visible rings. > this particular function =E2=80=8Bwould also need to deal with notifications between the physical NIC and the exported netmap rings, and i would probably leave it to userspace. You should be able to do what you have in mind using the programmable forwarding function =E2=80=8B =E2=80=8B that already exists for VALE ports =E2=80=8B (at the cost of a memory copy, which could be avoided when/if we decide to support VALE ports that share the same memory region hence using zero copy.=E2=80=8B Don't hold your breath though. cheers luigi=E2=80=8B --=20 -----------------------------------------+------------------------------- Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2217533 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+-------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BhQ2%2BiYyRH8rUWbmnxCTO1KEx7dF%2BCjMgt9FyUVhitMpMF73A>