From owner-freebsd-net Tue Jul 2 14:13:11 2002 Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4497537C208 for ; Tue, 2 Jul 2002 14:11:32 -0700 (PDT) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6250944395 for ; Tue, 2 Jul 2002 10:44:13 -0700 (PDT) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id g62Hkr518079; Tue, 2 Jul 2002 13:46:53 -0400 (EDT) (envelope-from bmilekic@unixdaemons.com) Date: Tue, 2 Jul 2002 13:46:53 -0400 From: Bosko Milekic To: Luigi Rizzo Cc: net@FreeBSD.ORG Subject: Re: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?) Message-ID: <20020702134653.A17641@unixdaemons.com> References: <20020702001050.B2250@iguana.icir.org> <20020702080423.A69693@unixdaemons.com> <20020702090354.A78632@unixdaemons.com> <20020702085640.B5854@iguana.icir.org> <20020702130135.A17370@unixdaemons.com> <20020702101222.C7966@iguana.icir.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20020702101222.C7966@iguana.icir.org>; from rizzo@icir.org on Tue, Jul 02, 2002 at 10:12:22AM -0700 Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, Jul 02, 2002 at 10:12:22AM -0700, Luigi Rizzo wrote: > On Tue, Jul 02, 2002 at 01:01:35PM -0400, Bosko Milekic wrote: > ... > > The reason I said "seemingly" is because of the fact that I don't > > think that it is your "grouping" of both clusters and mbufs that > > specifically causes the perf. increase to the extent you showed with > > your tests. What you're doing in the code you posted is also avoiding > > could be, i don't know... but this is exactly why we both agree > that we better provide an interface so that we can work out in > parallel improvements in the driver and improvements in the mbuf > code. > > > 1) Remove the malloc() for external ref. count allocations; instead, > > store the ref. count at the end of the cluster like I did when I took > > this seems to violate one of your points about cache pollution! > but i am totally neutral on this one. The slot that contains the reference count line will be invalidated in either case so this is something that cannot be avoided in both cases (the malloc() case is doubly-worse, even), so if there's no way to avoid the invalidation (it's a per-cluster reference count after all), there's not much I can do about it. > > 2) Introduce an interface that will allocate and free: > > (i) an mbuf with a cluster attached to it; > > (ii) an M_PKTHDR mbuf with a cluster attached to it; > > However, this interface would wrap the appropriate alloc/free > > routines, although it would take care to not drop the allocation > > lock between the allocations. I don't suspect this to be too > > difficult to do. > > fine with me. Ok, so we agree on the interface and you've agreed to compromise on the solution. Now I just have to deliver. :-) I'll send you a patch by the end of the week. I've decided to haul in one of my -CURRENT test boxes to work and get some work done here, hopefully. > > It is not safe if you use it too often. The allocator was designed to > > allocate, it HAS caches, it doesn't need for everyone under the Sun to > > start keeping their own caches on top of that. > > which is what happens when they realise they can do things faster :) This is really bad philosophy. It is my understanding that to Do The Right thing, we ought to make certain compromises in design. We need to allow for certain functionality to exist and we cannot do that with per-driver hacked-up "solutions." Sure, I can hack up a small little per-softc list from which I may even be able to do allocations from WITHOUT a lock, but that'll introduce a whole new level of complexity (think about mbufs getting misplaced, etc.) With huge SMP-related changes, we cannot afford anymore of these spaghetti-like changes. > > Here's what happens: > > > > Consumers A and B each keep their own "pools" of mbufs+clusters. > > ... > > look, this is exactly what happens with network interfaces. If > they fail to allocate a new mbuf, they keep recycling > the one they have from the receive queue instead of freeing it. Yes, but that's because they _need_ an mbuf, they can't get one, so they re-use one. If you build a local pool in which you store UNUSED mbufs, with no real idea of when they'll be used - only with the assumption/hope that you'll use them "soon" - and if you do this in several different places in the kernel, you are bound to hit brokeness under heavy load, when you need to block from other locations to get your network buffers and cannot do that because someone else is "assuming" they'll need a bunch "sometime soon." This is why we have per-CPU caches and a global cache. The per-CPU caches load themselves accordingly, and they'll give you what you're looking for when you need it, from a cache. Sure, the caches are a tad bit more expensive to twiddle, but this is the compromise we make in order to ensure that the system knows about our caches and is able to cope even under heavy load. > cheers > luigi -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message