From owner-freebsd-net@FreeBSD.ORG Mon Dec 15 17:58:48 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0553EA93 for ; Mon, 15 Dec 2014 17:58:48 +0000 (UTC) Received: from mail.lariat.net (mail.lariat.net [66.62.230.51]) by mx1.freebsd.org (Postfix) with ESMTP id C8FE7F64 for ; Mon, 15 Dec 2014 17:58:47 +0000 (UTC) Received: from Toshi.lariat.net (IDENT:ppp1000.lariat.net@localhost [127.0.0.1]) by mail.lariat.net (8.9.3/8.9.3) with ESMTP id KAA07133; Mon, 15 Dec 2014 10:57:27 -0700 (MST) Message-Id: <201412151757.KAA07133@mail.lariat.net> X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9 Date: Mon, 15 Dec 2014 10:56:38 -0700 To: Patrick Tracanelli From: Brett Glass Subject: Re: Can DUMMYNET handle weighting of traffic according to firewall rules? In-Reply-To: <59E7D981-B28B-4995-B8F4-6A2687CEF265@freebsdbrasil.com.br> References: <028d142b3a17cd5ffd5f21c6f9b9d6daaa8e2780@webmail.freebsdbrasil.com.br> <201412141635.JAA27068@mail.lariat.net> <59E7D981-B28B-4995-B8F4-6A2687CEF265@freebsdbrasil.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit Cc: John Nielsen , Luigi Rizzo , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Dec 2014 17:58:48 -0000 At 05:26 AM 12/15/2014, Patrick Tracanelli wrote: >Yes, it would. But should only be a big deal if you have too much >pps and low CPU to deal with the volume. The actual system for which we're prototyping will be power-constrained and will use one of several fairly weak but energy-efficient processors. We have to plan for possible high PPS on these because they may be carrying VoIP. The communications link will be fast, but it will be half duplex and/or shared via arbitration. The purpose of the project is to manage its bandwidth very well and very fairly so that it can accommodate many users. It's critical that it be able to take arbitration overhead and turnaround time into account. >I don’t quite agree, if you have enough CPU to pipe it together. >I run a number of setups where a WF2Qp or QFQ setup does the >weighting and later an extra pipe imposes other individual limits. We have too, and have found an increase in latency and jitter that you can actually measure. >Proper queue and HZ tuning tend to do the job while you have >enough CPU to deal with interrupts. We usually set HZ=1000 when we use DUMMYNET for bandwidth management. It would be nice if it could avoid incurring the full overhead of the systemwide scheduler on every clock tick, but given that there can be an arbitrarily large number of pipes and queues in the system it's hard to avoid this. >This is just theory. And I don’t mean it’s wrong. There could >certainly be a better way to add an extra cost factor to a flow, >but the pure fact is you don’t have it today. True. But if Luigi is correct and it's a one line kernel hack, it could be implemented quickly and easily. It's OK if there's a bit more parsing, etc. to do in the Chapter 8 utility, because that doesn't run in real time. >Let’s be practical, how much bw are we talking about and how much CPU? We're talking about systems where hundreds of flows would need to share the half duplex pipe. That's why the bandwidth metering needs to be very precise and fair. We have had other network gateways, which served only 100 users, where the idle time percentage reported by top(8) dropped to 30% under this sort of load. Needless to say, the system was extremely sluggish! We want to optimize to avoid this. We've looked at other options such as pf and altq, but DUMMYNET is so close to what we need that it seems best to adapt it. >Even if we are talking about a lot of bandwidth, you have many >tuning possibilities and you have netmap-aware dummynet to deal >with high pps rate. We want to do as much as possible in the kernel without ever making a ring transition to userspace, so netmap wouldn't really help in this particular case. It might be possible to cobble something creative together with Netgraph, but it would be much more complicated to write a custom Netgraph node than to add a one line patch to DUMMYNET. > > X could only be a whole number unless you fed the pipe multiple > times in EACH direction. > >As I understand your problem you would need to feed a flow in the >opposite direction to the same pipe anyway. >So it’s just a matter of 3 flows instead of 2. That's assuming that X=2. We'd like to be able to tune the system for cost ratios that are not whole numbers (or the reciprocals of whole numbers), because in real life media arbitration and polling schemes such as DOCSIS they're not. >I insist, not the beautiful approach, but not a big deal, unless >we are talking about 10G/40G connections or a server with 10yo computing power. We're possibly talking more than 1 Gbps... and we ARE talking ARMs, Atoms, or similar processors. >That’s not true. Having one_pass disable is a mostly a needed >feature if you have complex environments with a mix of filtering >and queueing, otherwise a single match in a pipe will result in a >pass behavior. It doesn't just affect queueing and pipes; it also affects in-kernel NAT. It's really not an optimal implementation, by the way. I have long thought that it would have been better to have a "don't come back" option that could be applied individually to an action -- the equivalent of the "quick" option that Luigi implemented in ipfilter. >Sure it would be more desirable not just for your needs, but for >dummynet feature set as a whole. But that’s just not something >you have today. True. But I can patch and build my own kernels (and also the Chapter 8 utility) and then submit my patches to the core developers once I've tested them. It's starting to sound as if this would be the best thing to do. I have not analyzed the IPFW code before, so it'd require a late night reading and coding session.... --Brett Glass