From owner-freebsd-net@FreeBSD.ORG Wed Oct 30 11:44:25 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 744B33F3 for ; Wed, 30 Oct 2013 11:44:25 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BDF5129AA for ; Wed, 30 Oct 2013 11:44:24 +0000 (UTC) Received: (qmail 61448 invoked from network); 30 Oct 2013 12:14:47 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 30 Oct 2013 12:14:47 -0000 Message-ID: <5270F101.6020701@freebsd.org> Date: Wed, 30 Oct 2013 12:44:01 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: MQ Patch. References: <40948D79-E890-4360-A3F2-BEC34A389C7E@lakerest.net> <526FFED9.1070704@freebsd.org> <52701D8B.8050907@freebsd.org> <527022AC.4030502@FreeBSD.org> <527027CE.5040806@freebsd.org> <5270309E.5090403@FreeBSD.org> <5270462B.8050305@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-net@freebsd.org" , Luigi Rizzo , Navdeep Parhar , Randall Stewart X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Oct 2013 11:44:25 -0000 On 30.10.2013 02:43, Adrian Chadd wrote: > Hi, [Meta: following your replies is often difficult because you're omitting context and citations] > We can't assume the hardware has deep queues _and_ we can't just hand > packets to the DMA engine. > > Why? > > Because once you've pushed it into the transmit ring, you can't > guarantee / impose any ordering on things. You can't guarantee that > you can abort a frame that has been queued because it now breaks the > queue rules. > > That's why we don't want to just have a light wrapper around hardware > transmit queues. We give up way too much useful control. The stack can't possibly know about all these differences in current and future technologies and requirements. That's why this decision should be pushed into the L3/L2 mapping/encapsulation and driver layer. Only those actually know about the requirements and constraints of any given technology. For wired ethernet there isn't any control over a packet once it has been inserted into the DMA ring and the packets are going to be processed sequentially. In that case the driver likely will chose a rather light wrapper to protect concurrent access to the DMA ring. An optimized version of such a wrapper will be provided by the kernel for the driver to link to. For other kinds of interfaces a very different strategy may be chosen. In your case with ieee80211 a more elaborate transmit scheme can be implemented without having to hack the kernel. In fact that's what you already mostly do there with the frame fragmentation, priority and retransmission code if I'm reading it correctly. The only difference in future being that the upper stack wont enforce any of the old IFQ, bufring or drbr handoff on you. You can chose one of the stock models or develop your own specially optimized version. > I've seen this both when doing wifi (where I absolutely have to have > per-node, per-TID queues, far before it hits the hardware) and doing > WAN style optimisation, where I want to ensure I only queue a handful > of milliseconds of frames to the hardware so I can ensure I can hit > QoS requirements (eg there being a large amount of bulk data, then I > want to inject some voice traffic that should go out sooner..) Sure. The ideas is to make it even easier for you to implement that without having to work around anything above ifnet. -- Andre