From owner-freebsd-net@freebsd.org Mon Jun 29 17:34:52 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 69F6F990A0D for ; Mon, 29 Jun 2015 17:34:52 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com [IPv6:2a00:1450:4010:c03::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CBE591C4E for ; Mon, 29 Jun 2015 17:34:51 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: by lagh6 with SMTP id h6so62488726lag.2 for ; Mon, 29 Jun 2015 10:34:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=2cCaNudmu4OKonIYU92VqhM+N0T3+JdcaBcVs2Y41Gc=; b=TGfIt9sOdA5wz5/4tjPBf5w4L9bOSLR9+8IvNDWPTs3Gw23d+CM05QK6lB2UxXd6ya VygybgfmQEfiEpxuU1MMPJULA3bJ0St9UBG05Gsm3LcaGFVwcJdShVzDRHU5UvZ+2e/e +QHjUr/Ct7ffqXFX7ZbMYP9bgs/z1l0gvm1LpJaqmTkegWi98siXY8bUdMhUsOQD8wrj o5DBEVj4O7xDyXjosHNAPT5TNApxx3Q2SzYc3AnW5lbI0YTLrXr1X46W8iHjEfPXhdo9 p2a1zPfpC9l01KQjKxC7CPs7p640SNbEN/0WGNkuKNNiI/8KeFOpFUqKodbJP7baEE7I 2rFg== MIME-Version: 1.0 X-Received: by 10.112.126.136 with SMTP id my8mr15164957lbb.18.1435599289820; Mon, 29 Jun 2015 10:34:49 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.230.103 with HTTP; Mon, 29 Jun 2015 10:34:49 -0700 (PDT) In-Reply-To: <20150629162213.GG1647@zxy.spb.ru> References: <20150629151750.GD1647@zxy.spb.ru> <20150629162213.GG1647@zxy.spb.ru> Date: Mon, 29 Jun 2015 19:34:49 +0200 X-Google-Sender-Auth: oBwQVWOxn7fHmZpbVYv9D18MGzk Message-ID: Subject: Re: netmap custom RSS and custom packet info From: Luigi Rizzo To: Slawa Olhovchenkov Cc: "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2015 17:34:52 -0000 On Mon, Jun 29, 2015 at 6:22 PM, Slawa Olhovchenkov wrote: > On Mon, Jun 29, 2015 at 06:05:41PM +0200, Luigi Rizzo wrote: > > > On Mon, Jun 29, 2015 at 5:17 PM, Slawa Olhovchenkov > wrote: > > > > > Working with netmap and modern hardware I am lacking some features: > > > > > > a) some spare space before packet (64/128/192/256 bytes) for > > > application data. For example: application do some pre-analysig > > > packet, filled structure in this space and routed packet (via NETMAP > > > pipe) to other thread. Received thread got packet and linked > > > inforamtion about this packet for processing w/o additional overhead. > > > > > > > =E2=80=8Bspare space in front of the packet is something we have > > been considering for a different purpose, namely better > > support for encapsulation/decapsulation and things like > > vhost-net header. > > Adding more space (sysctl or ioctl controled may be satisfy both: > 4-8-20 bytes for encapsulation and rest for application). > > > =E2=80=8BNote though that the annotation is transferred for free > > only in the case of pipes or ports sharing the same memory > > region; vale ports would have to explicitly copy the > > extra=E2=80=8B bytes which is (moderately) expensive. > > I think this bytes don't be transfered throw VALE. > This is only packet-processing information, like tags, opposite to > VALE that is like packet transfered by wire. > =E2=80=8B > > > A quick and dirty way to support what you want is the following: > > - in the kernel code, modify NMB(), PNMB() and the offset between > > the netmap_ring and the first buffer to add the extra space > > you want in front of the packet. You can possibly make this > > offset a sysctl-controlled value > > > > - in netmap_vale.c, make a small change to the code that copies > > buffers so that it includes also the space before the actual packet. > > > > That should be all. > > Do you plan to do this? > I am don't like have permanenty private branch/patchs. > =E2=80=8Bpossibly in the long term yes, but before doing it i want to design it properly so that it does not look like a custom hack. > > > b) custom RSS. Modern NIC have RSS poorly interoperable with packet > > > analysing: packets from same flow, but different direction placed in > > > different queue, PPPoE encapsulated packets placed in queue 0, > > > different tunneling don't recognised and etc. May be NETMAP can be > > > used custom RSS hashing from loadable kernel module, provideng by > > > user? Function frm this module can be packet analysing, tunnel > > > removing, custom RSS hashnig with direction-independly maner, filled > > > some structure prepended to buffer (see above) and pass this > > > information to application. > > > > > > > =E2=80=8BRSS is completely orthogonal to=E2=80=8B > > > > =E2=80=8B netmap and I strongly > > suggest to keep it this way, using either use the NIC-specific > > tools to control RSS or some generic mechanism > > (on linux there is ethtool, and we should implement something > > similar also on freebsd). > > This is not true RSS. This is only trick for reassigning RX packets to > different netmap rings. All hardware avalable RSS mechanism is fully > inacceptable for this: > > - don't support different encapsulation (PPPoE, GRE, GTP and etc) > - give different rings for packet 1.2.3.4->5.6.7.8 and 5.6.7.8->1.2.3.4 > > Producing unversal hashing/distributing mechanism is too complex. But > using user-providing kernel module (syncing to application) may be > acceptable? > > This is like ephemeral permanent NETMAP pipe between real hardware > RX rings/driver and application visible rings. > this particular function =E2=80=8Bwould also need to deal with notifications between the physical NIC and the exported netmap rings, and i would probably leave it to userspace. You should be able to do what you have in mind using the programmable forwarding function =E2=80=8B =E2=80=8B that already exists for VALE ports =E2=80=8B (at the cost of a memory copy, which could be avoided when/if we decide to support VALE ports that share the same memory region hence using zero copy.=E2=80=8B Don't hold your breath though. cheers luigi=E2=80=8B --=20 -----------------------------------------+------------------------------- Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2217533 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+-------------------------------