From owner-freebsd-net  Tue Jul  2 14:13:11 2002
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4497537C208
	for <net@FreeBSD.ORG>; Tue,  2 Jul 2002 14:11:32 -0700 (PDT)
Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6250944395
	for <net@FreeBSD.ORG>; Tue,  2 Jul 2002 10:44:13 -0700 (PDT)
	(envelope-from bmilekic@unixdaemons.com)
Received: (from bmilekic@localhost)
	by tesla.distributel.net (8.11.6/8.11.6) id g62Hkr518079;
	Tue, 2 Jul 2002 13:46:53 -0400 (EDT)
	(envelope-from bmilekic@unixdaemons.com)
Date: Tue, 2 Jul 2002 13:46:53 -0400
From: Bosko Milekic <bmilekic@unixdaemons.com>
To: Luigi Rizzo <rizzo@icir.org>
Cc: net@FreeBSD.ORG
Subject: Re: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?)
Message-ID: <20020702134653.A17641@unixdaemons.com>
References: <20020702001050.B2250@iguana.icir.org> <20020702080423.A69693@unixdaemons.com> <20020702090354.A78632@unixdaemons.com> <20020702085640.B5854@iguana.icir.org> <20020702130135.A17370@unixdaemons.com> <20020702101222.C7966@iguana.icir.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <20020702101222.C7966@iguana.icir.org>; from rizzo@icir.org on Tue, Jul 02, 2002 at 10:12:22AM -0700
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-net.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-net>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-net>
X-Loop: FreeBSD.org


On Tue, Jul 02, 2002 at 10:12:22AM -0700, Luigi Rizzo wrote:
> On Tue, Jul 02, 2002 at 01:01:35PM -0400, Bosko Milekic wrote:
> ...
> >   The reason I said "seemingly" is because of the fact that I don't
> >   think that it is your "grouping" of both clusters and mbufs that
> >   specifically causes the perf. increase to the extent you showed with
> >   your tests.  What you're doing in the code you posted is also avoiding
> 
> could be, i don't know... but this is exactly why we both agree
> that we better provide an interface so that we can work out in
> parallel improvements in the driver and improvements in the mbuf
> code.
> 
> >   1) Remove the malloc() for external ref. count allocations; instead,
> >   store the ref. count at the end of the cluster like I did when I took
> 
> this seems to violate one of your points about cache pollution!
> but i am totally neutral on this one.

  The slot that contains the reference count line will be invalidated in
  either case so this is something that cannot be avoided in both cases
  (the malloc() case is doubly-worse, even), so if there's no way to
  avoid the invalidation (it's a per-cluster reference count after all),
  there's not much I can do about it.

> >   2) Introduce an interface that will allocate and free:
> >      (i) an mbuf with a cluster attached to it;
> >      (ii) an M_PKTHDR mbuf with a cluster attached to it;
> >   However, this interface would wrap the appropriate alloc/free
> >   routines, although it would take care to not drop the allocation
> >   lock between the allocations.  I don't suspect this to be too
> >   difficult to do.
> 
> fine with me.

  Ok, so we agree on the interface and you've agreed to compromise on
  the solution.  Now I just have to deliver. :-)  I'll send you a patch
  by the end of the week.  I've decided to haul in one of my -CURRENT
  test boxes to work and get some work done here, hopefully.

> >   It is not safe if you use it too often.  The allocator was designed to
> >   allocate, it HAS caches, it doesn't need for everyone under the Sun to
> >   start keeping their own caches on top of that.
> 
> which is what happens when they realise they can do things faster :)

  This is really bad philosophy.  It is my understanding that to Do The
  Right thing, we ought to make certain compromises in design.  We need
  to allow for certain functionality to exist and we cannot do that with
  per-driver hacked-up "solutions."  Sure, I can hack up a small little
  per-softc list from which I may even be able to do allocations from
  WITHOUT a lock, but that'll introduce a whole new level of complexity
  (think about mbufs getting misplaced, etc.)  With huge SMP-related
  changes, we cannot afford anymore of these spaghetti-like changes. 

> >   Here's what happens:
> > 
> >   Consumers A and B each keep their own "pools" of mbufs+clusters.
> >   ...
> 
> look, this is exactly what happens with network interfaces. If
> they fail to allocate a new mbuf, they keep recycling
> the one they have from the receive queue instead of freeing it.

  Yes, but that's because they _need_ an mbuf, they can't get one, so
  they re-use one.  If you build a local pool in which you store UNUSED
  mbufs, with no real idea of when they'll be used - only with the
  assumption/hope that you'll use them "soon" - and if you do this in
  several different places in the kernel, you are bound to hit brokeness
  under heavy load, when you need to block from other locations to get
  your network buffers and cannot do that because someone else is
  "assuming" they'll need a bunch "sometime soon."

  This is why we have per-CPU caches and a global cache.  The per-CPU
  caches load themselves accordingly, and they'll give you what you're
  looking for when you need it, from a cache.  Sure, the caches are a
  tad bit more expensive to twiddle, but this is the compromise we make
  in order to ensure that the system knows about our caches and is able
  to cope even under heavy load.

> 	cheers
> 	luigi

-- 
Bosko Milekic
bmilekic@unixdaemons.com
bmilekic@FreeBSD.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message