From owner-freebsd-current@FreeBSD.ORG Wed Aug 20 14:41:29 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 61FD589F; Wed, 20 Aug 2014 14:41:29 +0000 (UTC) Received: from mail-la0-x229.google.com (mail-la0-x229.google.com [IPv6:2a00:1450:4010:c03::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 96ADD36C4; Wed, 20 Aug 2014 14:41:28 +0000 (UTC) Received: by mail-la0-f41.google.com with SMTP id s18so7390085lam.14 for ; Wed, 20 Aug 2014 07:41:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=/cEvyJg2v1esY+eJ4fpvO3kbpP3t7VTjrLqwzxVHAyQ=; b=iVQ4PHBNtiS+6uN752S4ynKUyHkm4Q8Bk0SAzD77TXFUr/E9G7u7aICWZ7gZdszeF1 w+3pi92gXn5j/uRcf9z2uqxGAbPUe7BEbKg/9/ueRijWKMbG5vrvqT6ZkE8BC9kCxxOV amzXZdRHGxaN71Zqde7JPXdiz/voM5N3MiAtI11WfccKuByMXjUsKJ+R0rdxrycLJkyj e/zOda44X0wirwTira38HFuepLVXZXufxFKpygyza8gcm1BeZCJ17gqRBAlfWCbycePR p93o2O0g6LhuU3T1MOfJ0vhohui5+7gudlHlGah/4oAvwgZOAnCELylyQ8b5wXqkcjyH qMzQ== MIME-Version: 1.0 X-Received: by 10.152.243.43 with SMTP id wv11mr43884437lac.52.1408545686343; Wed, 20 Aug 2014 07:41:26 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.244.2 with HTTP; Wed, 20 Aug 2014 07:41:26 -0700 (PDT) In-Reply-To: <53F4A2AF.6080102@selasky.org> References: <53BC2E73.6090700@selasky.org> <53BC43AE.3040409@FreeBSD.org> <53BD5385.4090208@selasky.org> <20140709163146.GA21731@ox> <53F44F91.2060006@selasky.org> <53F4A2AF.6080102@selasky.org> Date: Wed, 20 Aug 2014 16:41:26 +0200 X-Google-Sender-Auth: nsckhSo8zl9qPRFJY6BwzoD4CF4 Message-ID: Subject: Re: [RFC] Add support for hardware transmit rate limiting queues [WAS: Add support for changing the flow ID of TCP connections] From: Luigi Rizzo To: Hans Petter Selasky Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-net@freebsd.org" , FreeBSD Current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Aug 2014 14:41:29 -0000 On Wed, Aug 20, 2014 at 3:29 PM, Hans Petter Selasky wrote: > Hi Luigi, > > > On 08/20/14 11:32, Luigi Rizzo wrote: > >> On Wed, Aug 20, 2014 at 9:34 AM, Hans Petter Selasky >> wrote: >> >> Hi, >>> >>> A month has passed since the last e-mail on this topic, and in the >>> meanwhile some new patches have been created and tested: >>> >>> Basically the approach has been changed a little bit: >>> >>> - The creation of hardware transmit rings has been made independent of >>> the >>> TCP stack. This allows firewall applications to forward traffic into >>> hardware transmit rings aswell, and not only native TCP applications. >>> This >>> should be one more reason to get the feature into the kernel. >>> =E2=80=8B... >>> >> =E2=80=8Bthe patch seems to include only part of the generic code (ie no= ioctls >> for manipulating the rates, no backend code). Do i miss something ? >> > > The IOCTLs for managing the rates are: > > SIOCARATECTL, SIOCSRATECTL, SIOCGRATECTL and SIOCDRATECTL > > And they go to the if_ioctl callback.=E2=80=8B =E2=80=8Bi really think these new 'advanced' features should go through some ethtool-like API, not more ioctls. We have a strong need to design and implement such an API also to have a uniform mechanism to manipulate rss, queues and other NIC features. =E2=80=8B...=E2=80=8B > > > >> I have a few comments/concerns: >> >> + looks like flowid and txringid are overlapped in scope, >> both will be used (in the backend) to select a specific >> tx queue. I don't have a solution but would like to know >> how do you plan to address this -- does one have priority >> over the other, etc. >> > > Not 100% . In some cases the flowID is used differently than the txringid= , > though it might be possible to join the two. Would need to investigate > current users of the flow ID. =E2=80=8Bin some 10G drivers i have seen, at the driver level the flowid is used on the tx path to assign packets to a given =E2=80=8Btx queue, generally to improve cpu affinity. Of course some applications may want a true flow classifier so they do not have to re-do the classification multiple times. But then, we have a ton of different classifiers with the same need -- e.g. ipfw dynamic rules, dummynet pipe/queue id, divert ports... Pipes are stored in mtags, which are very expensive so i do see a point in embedding them in the mbufs, it's just that going this path there is no end to the list. > > + related to the above, a (possibly unavoidable) side effect >> of this type of changes is that mbufs explode with custom fields, >> so if we could perhaps make one between flowid and txringid, >> that would be useful. >> > > Right, but ratecontrol is an in-general useful feature, especially for > high throughput networks, or do you think otherwise? of course i think =E2=80=8Bthe feature is useful, but see the previous point. We should find a way to manage it (and others) that does not pollute or require continuous changes to the struct mbuf. > > > >> + i am not particularly happy about the explosion of ioctls for >> setting and getting rates. Next we'll want to add scheduling, >> and intervals, and queue sizes and so on. >> For these commands outside the critical path it would be >> preferable a single command with an extensible structure. >> Bikeshed material i am sure. >> > > There is only one IOCTL in the critical path and that is the IOCTL to > change or update the TX ring ID. The other IOCTLs are in the non-critical > path towards the if_ioctl() callback. > > If we can merge the flowID and the txringid into one field, would it be > acceptable to add an IOCTL to read/write this value for all sockets? =E2=80=8Bsee above. i'd prefer an ethtool-like solution. cheers luigi =E2=80=8B