Date: Thu, 6 Dec 2012 09:35:16 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: Barney Cordoba <barney_cordoba@yahoo.com>, freebsd-net@freebsd.org Subject: Re: Latency issues with buf_ring Message-ID: <alpine.BSF.2.00.1212060929430.78351@fledge.watson.org> In-Reply-To: <201212041108.17645.jhb@freebsd.org> References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 4 Dec 2012, John Baldwin wrote: >> Q2: Are there any case studies or benchmarks for buf_ring, or it is just >> blindly being used because someone claimed it was better and offered it for >> free? One of the points of locking is to avoid race conditions, so the > > fact that you have races in a supposed lock-less scheme seems more than just > ironic. > > The buf_ring author claims it has benefits in high pps workloads. I am not > aware of any benchmarks, etc. ... joining this conversation a bit late -- still about two years behind on net@ :-) ... There are several places where having a good buf_ring primitive should offer significant benefits over blocking locks around queues: - ifnet transmit enqueue path, whether owned by the general stack (ifqueue) or the driver (as is often the case with if_transmit). - netisr queues used in deferred input dispatch, including loopback. - A future lockless hand-off of inbound TCP segments from the ithread/netisr to an already running user thread a la Van Jacobson's proposal to the Linux community (now implemented), which would significantly reduce contention on inpcb locks in many workloads. I've measured significant lock contention in all those places in the past, and I believe buf_ring was intended to address at least the first case. This isn't the same as having benchmarks showing that the current code is "better", but the right primitive used in the right way should almost certainly help all of those cases substantially. I know that when Philip Paeps was working with the Solarflare driver, switching to lockless dispatch in the outbound path made a significant difference. One thing we do need to make sure is handled well is bounds on queue length, since we don't want infinitely long queues when a backlog begins to form -- there's no reason this can't be done, although the specifics depend on what one wants to accomplish and how. I would like to see us making use of lockless queue primitives in these kinds of scenarios, motivated by benchmarking, and ideally addressing architectures with weaker memory consistency properly. We should definitely minimise the number of different implementations of those primitives as much as possible, since (as with locks themselves) they are very hard to get right, and debugging problems with them can be quite problematic. Robert
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1212060929430.78351>