From owner-freebsd-current@FreeBSD.ORG Wed Nov 12 19:48:49 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0646FF7 for ; Wed, 12 Nov 2014 19:48:49 +0000 (UTC) Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com [IPv6:2a00:1450:400c:c00::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 80FAEE69 for ; Wed, 12 Nov 2014 19:48:48 +0000 (UTC) Received: by mail-wg0-f41.google.com with SMTP id k14so15093240wgh.0 for ; Wed, 12 Nov 2014 11:48:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=b/lWpysit3+A3aiVwsGMdVgTYRnsnoJZrFFrYwNUGYc=; b=J9BErlev+WnFa8kgVSmM/E2+Cpxm4ZDfbDPtqcX6IIAPkXWkSg2VuATygys4asOnrM y5lRhUSelLg93pOiRJ4xbcU3wwQkhvqN+9ZND9J3Gay/usPmM7/6u4I/DNZZYbntnVDl 9/VxcTF7HuHCZr92ZIaUGJo4J+UglK3EKjCOe+0y0qkLCpzmfvuYERhjw8ziKMU2qVk6 1N3qS7MdD05NQy0ggZiafToQ3bo+cS8W5P+i6v0gvAWg1ktsrffYcfwG8q7UvrcJEVBa FwGX558E4/KP2X5RiA2UVRc3QNRYPeCQaY1zY6g5xTJjmkzRzGdX5Wz8wcWaWZePLVFr LTNQ== MIME-Version: 1.0 X-Received: by 10.180.182.233 with SMTP id eh9mr44970909wic.31.1415821726964; Wed, 12 Nov 2014 11:48:46 -0800 (PST) Sender: rizzo.unipi@gmail.com Received: by 10.194.19.9 with HTTP; Wed, 12 Nov 2014 11:48:46 -0800 (PST) In-Reply-To: <20141112191635.GA91736@zxy.spb.ru> References: <560E07EA-2506-487E-88DD-E2B9F7ED9792@lastsummer.de> <20141112191635.GA91736@zxy.spb.ru> Date: Wed, 12 Nov 2014 11:48:46 -0800 X-Google-Sender-Auth: aEWhgntiK28IW59X0CQ597jxJ4s Message-ID: Subject: Re: netmap: extension to store user data per packet/slot? From: Luigi Rizzo To: Slawa Olhovchenkov Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Franco Fichtner , freebsd-current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Nov 2014 19:48:49 -0000 On Wed, Nov 12, 2014 at 11:16 AM, Slawa Olhovchenkov wrote= : > On Tue, Nov 11, 2014 at 10:13:54PM +0100, Franco Fichtner wrote: > > > Hi Luigi, > > hi all, > > > > so I was running into logistics issues with netmap(4) > > with regard to zero-copy and redirection through pipes: > > working on a load-balancing framework revealed that it > > is very hard to track a packet's origins to later move > > it onward to the respective outgoing interface, be it > > another device or the host stack. > > > > Long story short: user data needs to be stored for the > > packet buffer or slot. > > I think need configurable (by sysctl) space recerved before packet. > This is may be used as user data. Or for insert VLAN/MPLS/QinQ/etc > headers. > > =E2=80=8Bthis is yet another requirement: not just metadata but also encapsulation. For the records, the VALE switch does have TSO support (implemented through the VHOST header) so that VMs can pass large segments across a switch and they are properly split when traffic goes to a physical interface or a port that does not support the header. We also support scatter-gather I/O at least on the switch (haven't implemented this feature yet on NICs). But please consider that following this route we end up more or less into the same complications that afflict the standard stack: everything is configurable and decided at runtime, and the code becomes a maze of conditionals or indirect function calls with little chance of optimisations. Also, it's not that one sysctl works for all cases. Different ports typically have different encapsulation sizes, NICs may have alignment constraints (even those who don't suffer if buffers not 64-byte aligned), so you'll end up with scatter-gather I/O or copying anyways. After two years of experience with netmap i am not so sure anymore that zero copy makes much sense, except perhaps for the case of large packets (but i am not so sure about that, either). Apart from benchmarks, if you want to do something useful with the packets you need to read the header, at which point the concerns on having data in cache or not are less significant and the cost of the copy is heavily reduced. Tracking ownership of buffers (which is needed for zero copy) is also expensive even when they are not shared (and we have great trouble in managing the "extra buffers" we recently added to the netmap API to support zero-copy, to the point that I am tempted to remove the feature. cheers luigi