From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 09:39:48 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5FFE5C4C; Thu, 6 Dec 2012 09:39:48 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2DDE98FC0C; Thu, 6 Dec 2012 09:39:48 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id CCA9346B1A; Thu, 6 Dec 2012 04:39:47 -0500 (EST) Date: Thu, 6 Dec 2012 09:39:47 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Andre Oppermann Subject: Re: Latency issues with buf_ring In-Reply-To: <50BE56C8.1030804@networx.ch> Message-ID: References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> <50BE56C8.1030804@networx.ch> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Barney Cordoba , Adrian Chadd , John Baldwin , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 09:39:48 -0000 On Tue, 4 Dec 2012, Andre Oppermann wrote: > For most if not all ethernet drivers from 100Mbit/s the TX DMA rings are so > large that buffering at the IFQ level doesn't make sense anymore and only > adds latency. So it could simply directly put everything into the TX DMA > and not even try to soft-queue. If the TX DMA ring is full ENOBUFS is > returned instead of filling yet another queue. However there are ALTQ > interactions and other mechanisms which have to be considered too making it > a bit more involved. I asserted for many years that software-side queueing would be subsumed by increasingly large DMA descriptor rings for the majority of devices and configurations. However, this turns out not to have happened in a number of scenarios, and so I've revised my conclusions there. I think we will continue to need to support transmit-side buffering, ideally in the form of a set of "libraries" that device drivers can use to avoid code replication and integrate queue management features fairly transparently. I'm a bit worried by the level of copy-and-paste between 10gbps device drivers right now -- for 10/100/1000 drivers, the network stack contains the majority of the code, and the responsibility of the device driver is to advertise hardware features and manage interactions with rings, interrupts, etc. On the 10gbps side, we see lots of code replication, especially in queue management, and it suggests to me (as discussed for several years in a row at BSDCan and elsehwere) that it's time to do a bit of revisiting of ifnet, pull more code back into the central stack and out of device drivers, etc. That doesn't necessarily mean changing notions of ownership of event models, rather, centralising code in libraries rather than all over the place. This is something to do with some care, of course. Robert