From owner-freebsd-net@FreeBSD.ORG  Mon Dec 15 17:58:48 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0553EA93
 for <net@freebsd.org>; Mon, 15 Dec 2014 17:58:48 +0000 (UTC)
Received: from mail.lariat.net (mail.lariat.net [66.62.230.51])
 by mx1.freebsd.org (Postfix) with ESMTP id C8FE7F64
 for <net@freebsd.org>; Mon, 15 Dec 2014 17:58:47 +0000 (UTC)
Received: from Toshi.lariat.net (IDENT:ppp1000.lariat.net@localhost
 [127.0.0.1]) by mail.lariat.net (8.9.3/8.9.3) with ESMTP id KAA07133;
 Mon, 15 Dec 2014 10:57:27 -0700 (MST)
Message-Id: <201412151757.KAA07133@mail.lariat.net>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Mon, 15 Dec 2014 10:56:38 -0700
To: Patrick Tracanelli <eksffa@freebsdbrasil.com.br>
From: Brett Glass <brett@lariat.net>
Subject: Re: Can DUMMYNET handle weighting of traffic according to
 firewall rules?
In-Reply-To: <59E7D981-B28B-4995-B8F4-6A2687CEF265@freebsdbrasil.com.br>
References: <CA+hQ2+gNZmMbo0-2fgS49mCNV7nTFDkBpHAzUDg8JoiUfsY5tg@mail.gmail.com>
 <028d142b3a17cd5ffd5f21c6f9b9d6daaa8e2780@webmail.freebsdbrasil.com.br>
 <201412141635.JAA27068@mail.lariat.net>
 <59E7D981-B28B-4995-B8F4-6A2687CEF265@freebsdbrasil.com.br>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; format=flowed
Content-Transfer-Encoding: 8bit
Cc: John Nielsen <lists@jnielsen.net>, Luigi Rizzo <rizzo@iet.unipi.it>,
 "freebsd-net@freebsd.org" <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Dec 2014 17:58:48 -0000

At 05:26 AM 12/15/2014, Patrick Tracanelli wrote:

>Yes, it would. But should only be a big deal if you have too much 
>pps and low CPU to deal with the volume.

The actual system for which we're prototyping will be 
power-constrained and will use one of several fairly weak but 
energy-efficient processors. We have to plan for possible high PPS 
on these because they may be carrying VoIP. The communications link 
will be fast, but it will be half duplex and/or shared via 
arbitration. The purpose of the project is to manage its bandwidth 
very well and very fairly so that it can accommodate many users. 
It's critical that it be able to take arbitration overhead and 
turnaround time into account.

>I don’t quite agree, if you have enough CPU to pipe it together. 
>I run a number of setups where a WF2Qp or QFQ setup does the 
>weighting and later an extra pipe imposes other individual limits.

We have too, and have found an increase in latency and jitter that 
you can actually measure.

>Proper queue and HZ tuning tend to do the job while you have 
>enough CPU to deal with interrupts.

We usually set HZ=1000 when we use DUMMYNET for bandwidth 
management. It would be nice if it could avoid incurring the full 
overhead of the systemwide scheduler on every clock tick, but given 
that there can be an arbitrarily large number of pipes and queues 
in the system it's hard to avoid this.

>This is just theory. And I don’t mean it’s wrong. There could 
>certainly be a better way to add an extra cost factor to a flow, 
>but the pure fact is you don’t have it today.

True. But if Luigi is correct and it's a one line kernel hack, it 
could be implemented quickly and easily. It's OK if there's a bit 
more parsing, etc. to do in the Chapter 8 utility, because that 
doesn't run in real time.

>Let’s be practical, how much bw are we talking about and how much CPU?

We're talking about systems where hundreds of flows would need to 
share the half duplex pipe. That's why the bandwidth metering needs 
to be very precise and fair. We have had other network gateways, 
which served only 100 users, where the idle time percentage 
reported by top(8) dropped to 30% under this sort of load. Needless 
to say, the system was extremely sluggish! We want to optimize to 
avoid this. We've looked at other options such as pf and altq, but 
DUMMYNET is so close to what we need that it seems best to adapt it.

>Even if we are talking about a lot of bandwidth, you have many 
>tuning possibilities and you have netmap-aware dummynet to deal 
>with high pps rate.

We want to do as much as possible in the kernel without ever making 
a ring transition to userspace, so netmap wouldn't really help in 
this particular case. It might be possible to cobble something 
creative together with Netgraph, but it would be much more 
complicated to write a custom Netgraph node than to add a one line 
patch to DUMMYNET.

> > X could only be a whole number unless you fed the pipe multiple 
> times in EACH direction.
>
>As I understand your problem you would need to feed a flow in the 
>opposite direction to the same pipe anyway.
>So it’s just a matter of 3 flows instead of 2.

That's assuming that X=2. We'd like to be able to tune the system 
for cost ratios that are not whole numbers (or the reciprocals of 
whole numbers), because in real life media arbitration and polling 
schemes such as DOCSIS they're not.

>I insist, not the beautiful approach, but not a big deal, unless 
>we are talking about 10G/40G connections or a server with 10yo computing power.

We're possibly talking more than 1 Gbps... and we ARE talking ARMs, 
Atoms, or similar processors.

>That’s not true. Having one_pass disable is a mostly a needed 
>feature if you have complex environments with a mix of filtering 
>and queueing, otherwise a single match in a pipe will result in a 
>pass behavior.

It doesn't just affect queueing and pipes; it also affects 
in-kernel NAT. It's really not an optimal implementation, by the 
way. I have long thought that it would have been better to have a 
"don't come back" option that could be applied individually to an 
action -- the equivalent of the "quick" option that Luigi 
implemented in ipfilter.

>Sure it would be more desirable not just for your needs, but for 
>dummynet feature set as a whole. But that’s just not something 
>you have today.

True. But I can patch and build my own kernels (and also the 
Chapter 8 utility) and then submit my patches to the core 
developers once I've tested them. It's starting to sound as if this 
would be the best thing to do. I have not analyzed the IPFW code 
before, so it'd require a late night reading and coding session....

--Brett Glass