From owner-freebsd-net@FreeBSD.ORG Tue Jul 8 19:17:05 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1D8DCB3A; Tue, 8 Jul 2014 19:17:05 +0000 (UTC) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E0EF326B9; Tue, 8 Jul 2014 19:17:04 +0000 (UTC) Received: by mail-pa0-f45.google.com with SMTP id rd3so7849458pab.32 for ; Tue, 08 Jul 2014 12:17:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=S7PLf0ocR4nPH4KlKgz1VM/lnTgZ/DmKEPy4DdxrYFE=; b=NXS9uEg03+okws7yonBHA+E7AMHWdffH8uz7WHlh1Wata0ScpuAPhMvkMMSngp4hW8 m3i1+3ij/0NO0+XYwJg97+X3JQ7E7olnwd6QQYxcWOExLf7Nx8unWjQnX7W6RDKOAicN nlU8oqZxqKZPUN/rglHjqAM3q2k3FJPUfB8CduOEEYLfB7RvgsaX35oRE8yoHI4XNrgR 3kM4qs27/ijB/P1gmgiH5vcxtW3OKXUZwifOmyUrVZpBIR3r3sMBHdg9luNEpnDNCNSJ lC7TJbdBiCCgQSGTmMo7gCuVMCXTsx1NgT4+MyD7WKU+IlebpJM7DnSmKrHy6xMQhKu7 SjOA== X-Received: by 10.68.176.5 with SMTP id ce5mr26991955pbc.93.1404847024115; Tue, 08 Jul 2014 12:17:04 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id b3sm14354450pbu.8.2014.07.08.12.17.03 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 08 Jul 2014 12:17:03 -0700 (PDT) Sender: Navdeep Parhar Message-ID: <53BC43AE.3040409@FreeBSD.org> Date: Tue, 08 Jul 2014 12:17:02 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Hans Petter Selasky , freebsd-net@freebsd.org, FreeBSD Current Subject: Re: [RFC] Add support for changing the flow ID of TCP connections References: <53BC2E73.6090700@selasky.org> In-Reply-To: <53BC2E73.6090700@selasky.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2014 19:17:05 -0000 On 07/08/14 10:46, Hans Petter Selasky wrote: > Hi, > > I'm working on a new feature which will allow TCP connections to be > timing controlled by the ethernet hardware driver, actually the mlxen > driver. The main missing piece in the kernel is to allow the mbuf's > flowid value to be overwritten in "struct inpcb" once the connection is > established and to have a callback once the TCP connection is gone so > that the assigned "flowid" can be freed by the ethernet hardware driver. > > The "flowid" will be used to assign the outgoing data traffic of a > specific TCP connections to a hardware controlled queue, which in > advance contain certain parameters about the timing for the transmitted > packets. > > To be able to set the flowid I'm using existing functions in the kernel > TCP code to lookup the "inpcb" structure based on the 4-tuple, via the > "ifp->if_ioctl()" callback of the network adapter. I'm also registering > a function method table so that I get a callback when the TCP connection > is gone. > > A this point of development I would like to get some feedback from > FreeBSD network guys about my attached patch proposal. > > The motivation for this work is to have a more reliable TCP > transmissions typically for fixed-rate media content going some > distance. To illustrate this I will give you an example from the world > of VoIP, which is using UDP. When doing long-distance VoIP calls through > various unknown networks and routers it makes a very big difference if > you are sending data 20ms apart or 40ms apart, even at the exact same > rate. In the one case you might experience a bunch of packet drops, and > in the other case, everything is fine. Why? Because the number of > packets you send per second, and the timing is important. The goal is to > apply some timing rules for TCP, to increase the factor of successful > transmission, and to reduce the amount of data loss. For high throughput > applications we want to do this by means of hardware. > > > While at it I would like to "typedef" the flowid used by mbufs, "struct > inpcb" and many more places. Where would the right place be to put such > a definition? In "sys/mbuf.h"? > > > Comments are appreciated! I think we need to design this to be as generic as possible. I have quite a bit of code that does this stuff but I haven't pushed it upstream or even offered it for review (yet). cxgbe(4) hardware does throttling and traffic pacing too, but it's not limited to TCP, and it can do it per queue or per "flow" -- you can limit a tx queue or an individual flow to a packet-per-second limit or a bandwidth ceiling; this works for both plain NIC (TCP, UDP, whatever), as well as stateful TCP offload). For TCP (NIC or TOE) the chip can even rewrite the TCP timestamp to account for the extra time that the chip/driver held the packet because it was asked to slow down a flow. The per queue stuff is handled via a driver-specific tool (cxgbetool). For per-flow throttling my implementation adds a new sockopt (SO_TX_THROTTLE) that lets an application specify a throttle rate for a socket. The kernel allocates a "flow identifier" for each such socket and tcp_output (or udp_output, ..) will attach an mbuf tag containing this identifier and throttling parameters to each mbuf that it pushes out. Drivers for hardware that can throttle traffic look for this tag, the rest ignore it. - cxgbe(4) registers itself as a "flow throttling provider" with the kernel when it attaches to the chip. It tells the kernel how many flows it can handle and the range of rates it can handle. - setsockopt(SO_TX_THROTTLE, rate) makes the kernel allocate a unique identifier for the socket. This is *not* related to the RSS flowid at all. If a listening socket has SO_TX_THROTTLE, all its children will inherit the rate limiting parameters but will each get its own unique identifier. The setsockopt fails if there aren't any flow throttling providers registered, - tcp_output (and other proto_output) routines look for SO_TX_THROTTLE and attach extra metadata, in the form of a tag, to the outgoing frames. - cxgbe(4) reads this metadata and acts on it. Regards, Navdeep