From owner-freebsd-net@FreeBSD.ORG Thu Nov 8 11:13:22 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 227F9987 for ; Thu, 8 Nov 2012 11:13:22 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 5C1798FC0C for ; Thu, 8 Nov 2012 11:13:20 +0000 (UTC) Received: (qmail 64485 invoked from network); 8 Nov 2012 12:48:27 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 8 Nov 2012 12:48:27 -0000 Message-ID: <509B93CA.90609@freebsd.org> Date: Thu, 08 Nov 2012 12:13:14 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: pyunyh@gmail.com Subject: Re: svn commit: r242739 - stable/9/sys/dev/ti References: <201211080206.qA826RiN054539@svn.freebsd.org> <20121108023858.GA3127@michelle.cdnetworks.com> In-Reply-To: <20121108023858.GA3127@michelle.cdnetworks.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , Adrian Chadd , Pyun YongHyeon X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Nov 2012 11:13:22 -0000 On 08.11.2012 03:38, YongHyeon PYUN wrote: > On Wed, Nov 07, 2012 at 06:15:30PM -0800, Adrian Chadd wrote: >> If so, may I suggest we perhaps accelerate discussing if_transmit() of >> multiple frames per call? > > Hmm, actually I'm still not a fan of if_transmit() at this moment. > Honestly I don't have good queuing code in driver to handle queue > full condition. Interactions with altq(9) is also one of my > concern as well as packet reordering issue of drbr(9) interface. The whole interface packet handoff needs some serious reconsideration. These days we have two queues/buffers at the interfaces. One is the DMA ring which can take a considerable number of packets and the second one is the ifq (if enabled). The DMA ring already adds significant depth and latency so that a packet scheduler like altq(9) become almost useless. Also modern queue management algorithms like CoDel don't work with the current framework. Also bufferbloat is a major concern. See ACM queue article by Jim Gettys. What we need to take make this functionality available again is a well specified and reasonably simple interface handoff. It should include information on the maximum tx DMA ring depth and the current depth. There should also be a function to limit the current depth to a certain value. What I'd like to see is this (names are not fixed): if_send() as the main entry point for the stack. It's a function pointer within struct ifnet. In normal operation it is the same as if_transmit() and directly adds a packet to the tx DMA ring. Locking of the DMA ring is done in this function and a property of the driver. The stack always calls unlocked. Obviously the tx DMA ring lock must not be a sleep lock. When altq(9) or equivalent is active this function pointer is replaced with a call to the alternative queuing function that does it's magic. Again locking of the queuing mechanism is the property of that mechanism. When a NIC has multiple queues that it can bind to CPU's locking may not be necessary. We gain this flexibility in the driver to do that. if_transmit() is a function pointer for a function that directly adds a packet to the tx DMA ring (if a free slot is available). It is never called by the stack directly except in special circumstances. The altq(9), if active, uses if_transmit() to add packets to the tx DMA ring. If not active, if_send() is this function pointer. if_txeof() is a function pointer for a callback from the driver to an altq(9) dequeue function, if active. It is called when when new free slots on the tx DMA ring are available. When a driver needs a software interface queue because the tx DMA is too small, then the stack should provide a generic queuing implementation the driver can use. I've begun to explore this area while hacking on bge(4) and em(4) in tcp_workqueue branch. It's a very interesting path and we going to have a couple more discussions before we arrive at the optimal solution. :) -- Andre