From owner-freebsd-net@FreeBSD.ORG Fri Sep 12 07:31:22 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 12C6E204 for ; Fri, 12 Sep 2014 07:31:22 +0000 (UTC) Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com [IPv6:2a00:1450:4010:c04::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 78D787C4 for ; Fri, 12 Sep 2014 07:31:21 +0000 (UTC) Received: by mail-lb0-f181.google.com with SMTP id z11so368832lbi.40 for ; Fri, 12 Sep 2014 00:31:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=YUW8jVFBEQHNWCE+hCak4jmWR6J9XGWlUi0mv/lBjqA=; b=nw6y4D0JJJJ7w4LKztLFSCrDJm8UXpEcKlZTPRyqjUv6SstgTrPi3Ojz/gvqD/gXNn R7iGX3SQvSWJlhaob1L7Wq+DgQsrDd3l8v1vDHvThlsIbs2OoFUHcP6r0a849gc6eLr6 82qM4/oxu7Rc0En/YFvQizDj9/pdw8L9HdHldRTg/8ApV2HniN18pQzRIC54k7T1jc6L k/J0xyePZH5YCd2cqSw1LiJfywKGt+SAMojkInj3vFXNZNgCvwBi6P92fkUI1m5/FFnD W/NbH9MDHs7wWQhK01l/rTdE+zq9t76CUVoOu+ZsGxUIL9ooztqWJJzbLtjsXlrD1XtM Ry4g== MIME-Version: 1.0 X-Received: by 10.152.42.136 with SMTP id o8mr6618099lal.71.1410507079435; Fri, 12 Sep 2014 00:31:19 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.26.37 with HTTP; Fri, 12 Sep 2014 00:31:19 -0700 (PDT) In-Reply-To: References: Date: Fri, 12 Sep 2014 09:31:19 +0200 X-Google-Sender-Auth: uh9j1Jihg2fYq1jwXb_gZJJitxI Message-ID: Subject: Re: netmap wishlist From: Luigi Rizzo To: "Eggert, Lars" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Sep 2014 07:31:22 -0000 On Fri, Sep 12, 2014 at 7:59 AM, Eggert, Lars wrote: > Hi Luigi, > > I've started to play with netmap, like it a lot, and would like it to gro= w > support for some additional features that I'd need. I wonder if you could > comment on how likely support for any of the following is in netmap in th= e > foreseeable future? > > * IP/TCP/UDP checksum offload > * TCP/UDP segmentation offload > * TCP/UDP large receive offload > * jumbograms (I saw the email earlier today, so maybe that's addressed) > =E2=80=8BHi Lars: there is something already available/in progress for some of the above, but here are my thoughts on the various subjects: - netmap is designed to work with large frames, by setting the buffer size to something suitable (using a sysctl). There might be some lurking bugs (e.g. some NICs need to be told about the maximum frame size or they will refuse to send/receive them even though the slot in the NIC ring specifies a large buffer), but this is trivial to fix on a case by case basis. The downside is some waste on buffers (they are fixed size so having to allocate say 16K for a 64 byte frame is a bit annoying). - checksums offloading can be added trivially in the *_txsync(), once again on a per-nic basis. Problem is, is we start adding per-packet features (say, checksums, scatter-gather I/O, segmentation) in the inner loop of *_txsync() we are going to lose some performance for high rate applications. Now we are running at about 20ns/pkt (because we assume a flat data format), having a few extra conditionals in the inner loop could easily eat another 5..20ns/pkt, and this makes me a bit uncomfortable, especially because the situations where these offloadings matter are typically with large packets, where we are not CPU bound. - the VALE switch has support for segmentation and checksum avoidance. Clients can register as virtio-net capable: in this case the port will accept/deliver large segments across that port, and do segmentation and checksum as required for ports that are not virtio-net enabled (e.g. physical NICs attached to the same VALE switch). This was developed earlier this year by Vincenzo Maffione. At the moment this only works on top of VALE ports, not NICs, and the reason is that there is a big win if the VM can deliver a large segments in one shot to another local VM. Much less useful if you are talking across a physical device, in which case the OS should be able to do a reasonable job in segmenting packets (see also next item). We could probably leverage this code to work also on top of NICs connected through netmap, e.g. programming the NIC to use its own native offloading, but i am skeptical about the usefulness and concerned about the potential performance loss in *_txsync(). - Stefano Garzarella has some code to do software GSO (this is for FreeBSD, linux already has something similar), which will be presented at EuroBSDCon later this month in Sofia. This should address the segmentation issue on the host stack. - on the receive side, both FreeBSD and Linux have an efficient RLO software fallback in case the NIC does not support it natively, i think we do not need this at the NIC/switch level. cheers luigi