From owner-freebsd-arch  Wed Feb 27 11:26:19 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 7882737B400
	for <arch@FreeBSD.ORG>; Wed, 27 Feb 2002 11:26:13 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.11.6/8.9.1) id g1RJQCm29905;
	Wed, 27 Feb 2002 11:26:12 -0800 (PST)
	(envelope-from dillon)
Date: Wed, 27 Feb 2002 11:26:12 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200202271926.g1RJQCm29905@apollo.backplane.com>
To: Jeff Roberson <jroberson@chesapeake.net>
Cc: arch@FreeBSD.ORG
Subject: Re: Slab allocator
References:  <20020227005915.C17591-100000@mail.chesapeake.net>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG


:...
:
:There are also per cpu queues of items, with a per cpu lock.  This allows
:for very effecient allocation, and also it provides near linear
:performance as the number of cpus increase.  I do still depend on giant to
:talk to the back end page supplier (kmem_alloc, etc.).  Once the VM is
:locked the allocator will not require giant at all.
:...
:
:Since you've read this far, I'll let you know where the patch is. ;-)
:
:http://www.chesapeake.net/~jroberson/uma.tar
:...
:Any feedback is appreciated.  I'd like to know what people expect from
:this before it is committable.
:
:Jeff
:
:PS Sorry for the long winded email. :-)

    Well, one thing I've noticed right off the bat is that the code
    is trying to take advantage of per-cpu queues but is still
    having to obtain a per-cpu mutex to lock the per-cpu queue.

    Another thing I noticed is that the code appears to assume
    that PCPU_GET(cpuid) is stable in certain places, and I don't
    think that condition necessarily holds unless you explicitly
    enter a critical section (critical_enter() and critical_exit()).
    There are some cases where you obtain the per-cpu cache and lock 
    it, which would be safe even if the cpu changed out from under 
    you, and other case such as in uma_zalloc_internal() where you 
    assume that the cpuid is stable when it isn't.

    I also noticed that cache_drain() appears to be the only
    place where you iterate through the per-cpu mutexes.  All
    the other places appear to use the current-cpu's mutex.

    I would recommend the following:

	* That you review your code with special attention to
	  the lack of stability of PCPU_GET(cpuid) when you
	  are not in a critical section.

	* That you consider getting rid of the per-cpu locks
	  and instead use critical_enter() and critical_exit()
	  to obtain a stable cpuid in order to allocate or
	  free from the current cpu's cache without having to
	  obtain any mutexes whatsoever.

	  Theoretically this would allow most calls to allocate
	  and free small amounts of memory to run as fast as
	  a simple procedure call would run (akin to what
	  the kernel malloc() in -stable is able to accomplish).

	* That you consider an alternative method for draining
	  the per-cpu caches.  For example, by having the
	  per-cpu code use a global, shared SX lock along
	  with the critical section to access their per-cpu
	  caches and then have the cache_drain code obtain
	  an exclusive SX lock in order to have full access
	  to all of the per-cpu caches.

	* Documentation.  i.e. comment the code more, especially
	  areas where you have to special-case things like for
	  example when you unlock a cpu cache in order to
	  call uma_zfree_internal().

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message