From owner-freebsd-current@FreeBSD.ORG Wed Aug 20 21:40:20 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 418F1E3B; Wed, 20 Aug 2014 21:40:20 +0000 (UTC) Received: from mail-qa0-x22d.google.com (mail-qa0-x22d.google.com [IPv6:2607:f8b0:400d:c00::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CF4923923; Wed, 20 Aug 2014 21:40:19 +0000 (UTC) Received: by mail-qa0-f45.google.com with SMTP id cm18so7413005qab.4 for ; Wed, 20 Aug 2014 14:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=vgQmpE2gTpgOy1yKj0ZSu2AdiXnnzDCKthQUTvZMsOA=; b=EDnqwKk1m3WmtWsG/7ary8bvMhjjVG+XtaC89DnLJjfCIHs4MCVpJ1yvkWIPx6ac91 blmneV3dac7MBWqMaIZJqeSegrUKRXVbCu9koDWqiR5ujZ0o7RB3qDduMrKU1S/22AiL y3N7hGpYbTG9BdpXfhFRrbbPAJX0B4YXR9N0O9+SLz13YbKSBpe/BoDWVLG51+LKopWC Hg3uJxuQ0910+1YQRM/NXR0AVvVRTvZdMVPksKXcMjSL2dGWM1z6ociY3Ipa1xQBFQF6 OYJt0ZikNgDk2SXWh6cmnPGdDgFm546E1HuOUDhsnWfOGBNxysTD4q/JK30SyncTlhwf QU+A== MIME-Version: 1.0 X-Received: by 10.224.75.130 with SMTP id y2mr81650108qaj.72.1408570818873; Wed, 20 Aug 2014 14:40:18 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.224.17.129 with HTTP; Wed, 20 Aug 2014 14:40:18 -0700 (PDT) In-Reply-To: <53F4F626.9030806@selasky.org> References: <53BC2E73.6090700@selasky.org> <53BC43AE.3040409@FreeBSD.org> <53BD5385.4090208@selasky.org> <20140709163146.GA21731@ox> <53F44F91.2060006@selasky.org> <53F4EC8A.9090804@FreeBSD.org> <53F4F626.9030806@selasky.org> Date: Wed, 20 Aug 2014 14:40:18 -0700 X-Google-Sender-Auth: U8es1S4X4Uk_24FVyfgq3Ls0hIY Message-ID: Subject: Re: [RFC] Add support for hardware transmit rate limiting queues [WAS: Add support for changing the flow ID of TCP connections] From: "K. Macy" To: Hans Petter Selasky Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-net@freebsd.org" , FreeBSD Current , Navdeep Parhar X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Aug 2014 21:40:20 -0000 > > > >> - uint32_t -> m_flowid_t is plain gratuitous. Now we need to include >> mbuf.h in more places just to get this definition. What's the >> advantage of this? style(9) isn't too fond of typedefs either. Also, >> drivers *do* need to know the width of the flowid. At least lagg(4) >> looks at the high bits of the flowid (see flowid_shift in lagg). How >> high it can go depends on the width of the flowid. >> > > The flowid should be typedef'ed. Else how can you know its type passing > flowid along function arguments and so on? I agree with Navdeep. It's usage should be obvious from context. This just pollutes the namespace. > > > >> - Interfaces can come and go, routes can change, and so the relationship >> between an inpcb and txringid is not stable at all. What happens when >> the outbound route for an inpcb changes? >> > > This is managed separately by a daemon or such. No it's not. Currently, unless you're using flowtables, the route and llentry are looked up for every single outbound packet. Most users are lightly enough loaded that they don't see the potential 8x reduction in pps from the added overhead. > The problem about using the "inpcb" approach which you are suggesting, is > that you limit the rate control feature to traffic which is bound by > sockets. Can your way of doing rate control be useful to non-socket based > firewall applications, for example? > > You also assume a 1:1 mapping between "inpcb" and the flowID, right. What > about M:N mappings, where multiple streams should share the same flowID, > because it makes more sense? That doesn't make any sense to me. FlowIDs are not a limited resource like 8-bit ASIDs where clever resource management was required. An M:N mapping would permit arbitrary interleaving of multiple streams which simply doesn't seem useful unless there is some critical case where it is a huge performance win. In general it is adding complexity for a gratuitous generalization. > > > >> - The in_ratectlreq structure that you propose is inadequate in its >> current form. For example, cxgbe's hardware can do rate limiting on a >> per-ring as well as per-connection basis, and it allows for pps, >> bandwidth, or min-max limits. I think this is the critical piece that >> we NIC maintainers must agree on before any code hits the core kernel: >> how to express a rate-limit policy in a standard way and allow for >> hardware assistance opportunistically. ipfw(4)'s dummynet is probably >> interested in this part too, so it's great that Luigi is paying >> attention to this thread. >> > > My "in_ratectlreq" is a work in progress. Which means that it probably makes sense to not impinge upon the core system until it is more refined. > - The RATECTL ioctls deal with in_ratectlreq so we need to standardize >> the ratectlreq structure before these ioctls can be considered generic >> ifnet ioctls. This is the reason cxgbetool (and not ifconfig) has a >> private ioctl to frob cxgbe's per-queue rate-limiters. I did not want >> to add ifnet ioctls that in reality were cxgbe only. Ditto for i2c >> ioctls. Now we have multiple drivers with i2c and melifaro@ is doing >> the right thing by promoting these private ioctls to a standard ifnet >> ioctl. Have you considered a private mlxtool as a stop gap measure? >> > > It might end that we need to create our own tool for this, having vendor > specific IOCTLs, if we cannot agree how to do this in a general way. > > > >> To summarize my take on all of this: we need a standard ratectlreq >> structure, >> > > Agree. > > > a standard way to associate an inpcb with one, >> > > Maybe. Associating it with an inpcb doesn't exclude adding a mechanism for supporting it in firewalls. -K