From owner-freebsd-net  Tue Oct 31  9:45:19 2000
Delivered-To: freebsd-net@freebsd.org
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.freebsd.org (Postfix) with ESMTP id A15D537B479
	for <freebsd-net@FreeBSD.ORG>; Tue, 31 Oct 2000 09:45:14 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G3B00BDP29Z16@falla.videotron.net> for freebsd-net@FreeBSD.ORG; Tue, 31 Oct 2000 12:37:11 -0500 (EST)
Date: Tue, 31 Oct 2000 12:41:46 -0500 (EST)
From: Bosko Milekic <bmilekic@dsuper.net>
Subject: Re: MP: per-CPU mbuf allocation lists
In-reply-to: <20001031002953.O22110@fw.wintelcom.net>
X-Sender: bmilekic@jehovah.technokratis.com
To: Alfred Perlstein <bright@wintelcom.net>
Cc: freebsd-net@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0010311218350.35695-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=X-UNKNOWN
Content-transfer-encoding: QUOTED-PRINTABLE
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Tue, 31 Oct 2000, Alfred Perlstein wrote:

> Actually this might work. :)
>=20
> Consider, you're at 0 buffers on your freelist, you allocate a
> chunk from the general pool if it's bigger than the tunable 'lw'
> the whole chain goes onto your 'lw' list.
>=20
> As you allocate and free from lw, it will 'spill' onto your 'hw'
> list because that's where you free to when 'lw' is full. =20
>=20
> If you happen to grab something off the freelist smaller than 'lw'
> you just live with that, or possibly grab another chunk, either
> way this algorithm heals the chunks by spilling free's from lw to
> hw to make the cutoff.
>=20
> Eventually you'll overflow your hw list and free the correct sized
> chunk onto the freelist (hw - lw buffers).
>=20
> If you're just consuming mbufs well then someone else will be doing
> the above described free'ing balancing out the main pool.
>=20
> Remeber, accuracy with balancing the chunks isn't nearly as important
> as speed.

=09I never said that it wouldn't work.

=09I just said that it would if anything, add way too many hysterics and
  also make certain situations MUCH slower (like the situation when this
  sort of move from the global list to the fast list needs to be done).
  The purpose of doing this sort of thing, as you suggest it, would be to
  supposedly make future allocations from the same CPU faster (because they
  won't have to go dip into the global pool until the next time you run out
  on your fast list). Well, I'm telling you that it's really not worth the
  trouble.

  =09First of all, if it were to be done, you'd have to scratch my idea of
  how to wakeup sleepers, because otherwise, you would end up having some
  freeing of INDIVIDUAL (not belonging to any "chunk") mbufs onto mmbfree,
  which would further fragment it. The alternative solution would be to
  make yet another separate list to be used solely by the waiting routines,
  which would involve yet another mutex, etc, etc, etc.
  =09Second of all, dealing with various sized chunks is enough trouble as
  it is. It sounds very nice in theory, but in practise, it's not. It would
  mean that either you keep track of the last mbuf on your F2 fast list,
  which would involve, everytime you free to F2:
  =09* you're freeing to the head of the list, so you have to make your
=09top mbuf (the one you're adding) have nextpkt point to the last mbuf,
=09which you'll probably get from the previously first mbuf on F2, after
=09which point you'll set that previously-first mbuf's nextpkt to NULL
=09in order to avoid confusion on the mmbfree list.
  or that you "find" your last mbuf when it comes time to create the chunk
  on mmbfree by walking the mbufs that are part of the "chunk."

=09Third of all, consider the mayhem when the low watermark is changed
  several times; the mmbfree list gets so damn fragmented that it becomes
  complete HELL to combine/split chunks. Speaking of which, splitting
  chunks, if they are unreasonably large, is very costly, because it would
  mean walking the present "chunk" list until you get to the middle.

  =09Fourth=B8 if your mmbfree general list is empty, you have to allocate =
a
  page and split it into PAGE_SIZE / MSIZE mbufs, which may or may not be
  larger than the chunk you happen to be looking for at the moment. Say
  it's 50% larger (which is unlikely, but possible), you'll be walking the
  newly allocated mbufs splitting them into two chunks. Say it's smaller
  (very likely), you'll either be trying to allocate more pages (more
  overhead) -- more overhead -- or dealing with the fact that you couldn't
  grab as many mbufs as you wanted, and re-arranging chunks.

  =09Finally, think of the MAYHEM involved in possibly invalidating
  nextpkt pointers when changing chunk sizes.

  =09On the other hand, simply dropping this chunk idea will likely have
  you grabbing the mmbfree mutex quite often when you=B4ve got nothing on
  your fast list(s), but that's only until mbufs are again freed back to
  your fast lists... In other words, just because you have 0 on your fast
  list AT THIS VERY INSTANT, doesn't mean that you'll still have 0 in THIS
  VERY INSTANT + some DELTA_TIME.

  =09I don't think that anybody is prepared to introduce the hysterics
  and complexity of this "chunk" system in return for per CPU lists, when
  just eliminating the "chunk" idea would leave you with a system at WORST
  5-10% less performant, but at least 80% less complex.

> --=20
> -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
> "I have the heart of a child; I keep it in a jar on my desk."

=09I really liked ALL of the suggestions you made but this one, for
  more than obvious reasons. If you agree, I can go about implementing the
  revised design as I had presented it in my last Email, which would leave
  space for adding this "chunk" idea if for whatever reason you want to see
  for yourself. Then either you or me or someone else can eventually
  implement it and profile it before introducing it. Agreed?

  Cheers,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message