From owner-freebsd-hackers@FreeBSD.ORG Sun Jun 29 20:17:24 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 113F3106; Sun, 29 Jun 2014 20:17:24 +0000 (UTC) Received: from mail-lb0-x22c.google.com (mail-lb0-x22c.google.com [IPv6:2a00:1450:4010:c04::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4EC562641; Sun, 29 Jun 2014 20:17:23 +0000 (UTC) Received: by mail-lb0-f172.google.com with SMTP id c11so5303129lbj.17 for ; Sun, 29 Jun 2014 13:17:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:cc:content-type; bh=chXxtCYqNIa/hmaVyGoIh0dCkstasxcUQEtqpfTCi8Y=; b=ZK9PzxNFZL/MSoW0OHhbGsbRdJ9WlBpd/YvGAEoxjKJNpksqNHYsg4ylnd8amwQnlc j0EmEeQng86AFQIIBFKOeVzOI0iHyIJOkE0lxShyZKqy+A2i4CabouHZ3gTFC0veRzO0 e1Aky3+AXd3pNT9FpvNZd54D5Yxrm6z10kiabypcljN1yBKXCobCrn0aDLWlvL0hGNjo OzxO/XBlmrBLEyE0ky4P8aoxlsIkJClX+uiwGgvL/rqk+lv3Ecia1be7A5P15MEXrR1U 7qncCFSNcQivbCwvECbqyPDS7oOpVdY2hq5RdxOe2ascB61+sYpL/sKypHPae3XXV0yf vvLQ== MIME-Version: 1.0 X-Received: by 10.112.149.71 with SMTP id ty7mr26111600lbb.34.1404073041221; Sun, 29 Jun 2014 13:17:21 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.22.100 with HTTP; Sun, 29 Jun 2014 13:17:21 -0700 (PDT) Date: Sun, 29 Jun 2014 22:17:21 +0200 X-Google-Sender-Auth: DO4qwTihXFnnKuYE7-gkoRUcYLE Message-ID: Subject: Re: ipfw pipe config bw tun0 From: Luigi Rizzo To: Adrian Chadd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18 Cc: Wojciech Puchar , "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jun 2014 20:17:24 -0000 On Sun, Jun 29, 2014 at 6:29 PM, Adrian Chadd wrote: > We can start adding that. How should it behave for multi-queue devices? > =E2=80=8BLong reply, sorry about that: =E2=80=8B =E2=80=8Bif i remember well, this feature was implemented assuming that at most one packet was outstanding, so multiqueue was not really an issue: any time you get a completion interrupt from any queue you push out the next packet from the pipe. The goal was to provide weighted fair queueing using the actual NIC's bandwidth to clock packets out. Remember, this was done in '99 and on hardware that did not have queues or interrupt moderation. These days, between deep NIC queues, interrupt moderation, multiqueue and very high bandwidths, the assumption of one outstanding packet is a bad one for performance. You'd also have the option to tie a pipe to an individual queue or to the entire NIC (the user API changes to do this is trivial, e.g. you can append a :queue_number to to the interface name as i did in netmap). This said: 1. if you don't mind the fact that the interface has a deep queue, you could just push packets from a PIPE to an interface until if_transmit returns an error (make sure the packet is not lost by adding a reference to the mbuf or something), and then any interrupt completion from any queue would be used to 'clock' packets out. 2. if the NIC's queue bothers you (it might, because it adds an equivalent error to the nice properties of the scheduler), then the pipe could try to track how many bytes are queued, stop after a given threshold, and then when an interrupt completion is received decrease the 'outstanding' counter by the actual number of bytes sent. Essentially, what ALTQ does, but with the classification flexibility of ipfw/dummynet. Surely, keeping only one outstanding packet is too expensive and would kill throughput. But a modern interface with 256..1024 buffers of 1.5K each is up to 3..12 Mbits which is way too high. If we want to (re)implement this feature, we should preliminarly introduce some way to control the outstanding traffic on an interface -- can be done in dummynet as #2 above, or within the NIC's driver if we eventually build something like ethtool/bql . cheers luigi