FreeBSD Mail Archives

Date:      Thu, 5 Jul 2001 19:48:09 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Benno Rice <benno@FreeBSD.org>
Cc:        freebsd-smp@FreeBSD.org
Subject:   Re: VM Commits / GIANT_ macros
Message-ID:  <200107060248.f662m9w62000@earth.backplane.com>
References:  <200107041638.f64GcH844850@earth.backplane.com> <20010705102235.B71563@rafe.jeamland.net>


:
:
:--yNb1oOkm5a9FJOVX
:Content-Type: text/plain; charset=us-ascii
:Content-Disposition: inline
:Content-Transfer-Encoding: quoted-printable
:
:On Wed, Jul 04, 2001 at 09:38:17AM -0700, Matt Dillon wrote:
:>     Hello everyone!  Ok, after talking with John and others at USENIX
:>     and a doing a couple of back and forths with Alfred, I am officially
:>     taking over the main-line machine-independant VM system in -current.
:>=20
:>     I will also be working on i386 pmap, vm_object, vm_map, and the buffer
:>     cache (in regards to mutexes & Giant).
:
:Could you keep me posted wrt what locking is needed in pmap?  This should
:allow me to keep PowerPC in sync.  Don't have any plans to hit SMP on power=
:pc
:any time soon, but it'd be nice to have the infrastructure in place. =3D)
:
:--=20
:Benno Rice
:benno@FreeBSD.org

    Sure, I'll post updates to freebsd-smp.  Here's the first update:

    I spent a good deal of wednesday cleaning up the VM source files,
    breaking them up into manageable pieces and moving vm_page_zero_idle()
    from MD files to a new MI file.

    I spent about four hours experimenting with various fine-grained VM
    mutex models, e.g. simply by starting to code it and noting where I
    would bog-down.  I believe I have come up with one that is useable
    for vm_page_t manipulation.

    The issue we have with vm_page_t is that various entities currently 
    depend on the atomic_* ops or Giant to do things like lookup a page
    and then busy it.  This previously occured under splvm() in order
    to guarentee that nobody else would be able to busy the page while
    we were trying to.  Now it occurs under Giant.  The goal is to be able
    to do these sorts of operations without Giant.

    This same dependance is used to do things like add or remove a vm_page_t
    from its page queue, and add or remove a vm_page_t from the (object, index)
    hash table, and move vm_page_t's between page queues.

    This is the solution as I envision it.  It is a considerable amount of
    work, which I will be doing in stages.

	* We will have a mutex for each (PQ_XXX) page queue.  The appropriate
	  page queue mutex will be obtained to add or remove a vm_page_t to
	  that page queue (happens a lot), and to scan the queue 
	  (contigmalloc and the pageout daemon scan the page queues).

	* We will have a small shared array of mutexes to lock the 
	  (objet, index) hash chains.  For example, lets say you are in
	  vm_fault and do a vm_page_lookup() to lookup a page, and not
	  finding it you decide to vm_page_alloc() a new page.  In order
	  to protect this sequence of events vm_page_lookup() will obtain
	  the appropriate hash chain mutex and leave it held on return
	  (whether or not the page is found).  The caller will do whatever
	  it needs to do (non-blocking), and then release the hash chain
	  mutex. 

	  This allows callers to safely add or remove pages from hash
	  chains.

	* Many routines now lookup a page, then busy it, then release it
	  back onto a page queue (e.g. deactivate it, free it, activate it,
	  cache it).  e.g. vm_fault, pageout daemon, and many other 
	  interactions with the system.  These interactions currently operate
	  under Giant (used to operate under spl) and do not bother to 
	  'own' the page to execute the action.  These interactions, however,
	  do check that the page is now owned by someone else (aka that the
	  page is not PG_BUSY or PG_BUSY/vm_page->busy).

	  To allow callers to safely lookup and then manipulate pages, for
	  example to manipulate vm_page->flags, I intend to change the API
	  such that when you nominally get a page, it will be BUSY'd for yo.
	  For example, when selecting a free or cache page from the page
	  queues, the page would be returned already BUSY'd, allowing you to
	  manipulate the page and then release it back to a queue (or
	  initiate I/O, or whatever).   In many cases the caller intends to
	  busy the page anyway, so this is not much of a leap.

	  This only works if the page is not already busy, of course, but
	  nearly all users of the existing API skip or sleep/loop if the
	  returned page is busy, so we can fail gracefully and allow the
	  caller to do whatever needs to be done there.

    Finally we have issues with how to set PG_BUSY in the first place. 
    Currently setting PG_BUSY uses atomic_*.  It turns out that the solution
    is easy and does not require the use of any additional mutex operations.

	* When we are looking up a page that is on the free queue, aka
	  in vm_page_alloc(), simply holding the appropriate page queue mutex 
	  (which we *ALREADY* hold in most cases) is sufficient to allow
	  us to manipulate the free pages in that queue without worrying about
	  other threads messing with those pages.  Thus we can set PG_BUSY,
	  remove the page from the free queue, and then release the page
	  queue mutex before returning the newly allocated page.

	* When we are looking up a page that is on the cache queue, or 
	  is not associated with a queue, we simply aquire (or already hold
	  in most cases) the appropriate hash chain mutex.  Then if the page
	  is not already PG_BUSY, we know we can safely manipulate its flags
	  (set PG_BUSY).  If the page is already PG_BUSY
	  we need to sleep/loop anyway, so we can fail gracefully and let
	  the parent sleep/loop/do-whatever.

	* We will need to find a better way to sleep/wait for a busy page
	  to become available.  The current mechanism sets a PG_WANTED
	  flag in vm_page->flags, which doesn't work under the new scheme.
	  I expect I will transfer this sleep/wakeup mechanism to an array
	  of wanted flags in parallel with the VM hash chain mutex array.

    Ok, so what am I going to start with?  Well, I'm actually going to
    start with #3 ... changing the VM API to return pages that are PG_BUSY'd
    rather then making the caller busy them, and changing the various
    page queue ops (e.g. vm_page_cache(), vm_page_deactivate(), etc...) to
    unbusy the page automatically (some like vm_page_free() already work this
    way).  In most cases this allows existing code to operate as it used
    to with only minimal changes... for example, if the existing code 
    assumes protection by Giant (original code by splvm() and Giant) in
    order to retrieve, manipulate, and put back a page, the new code will
    be able to assume protection by the fact that it will be given a PG_BUSY'd
    page, which it can manipulate and put back.

    This preliminary work can be done without introducing VM mutexes just yet
    (i.e. I will do this work under Giant).  But once complete, this 
    preliminary work will allow me to then add the VM mutexes described above
    with very little effort and take a good chunk of the VM interface out from
    under Giant.

   --

   I am not going to start work on the other major interfaces... pmap,
   vm_object's, buffer cache, and so forth, until I complete the work on
   the vm_page interface.  These other interfaces work on a much more
   granular level that will allow us to, for example, give each vm_object
   a mutex (something we cannot and do not want to do for each vm_page).

   And that is where I stand at the moment.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200107060248.f662m9w62000>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation