Date: Thu, 5 Jul 2001 19:48:09 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Benno Rice <benno@FreeBSD.org> Cc: freebsd-smp@FreeBSD.org Subject: Re: VM Commits / GIANT_ macros Message-ID: <200107060248.f662m9w62000@earth.backplane.com> References: <200107041638.f64GcH844850@earth.backplane.com> <20010705102235.B71563@rafe.jeamland.net>
next in thread | previous in thread | raw e-mail | index | archive | help
: : :--yNb1oOkm5a9FJOVX :Content-Type: text/plain; charset=us-ascii :Content-Disposition: inline :Content-Transfer-Encoding: quoted-printable : :On Wed, Jul 04, 2001 at 09:38:17AM -0700, Matt Dillon wrote: :> Hello everyone! Ok, after talking with John and others at USENIX :> and a doing a couple of back and forths with Alfred, I am officially :> taking over the main-line machine-independant VM system in -current. :>=20 :> I will also be working on i386 pmap, vm_object, vm_map, and the buffer :> cache (in regards to mutexes & Giant). : :Could you keep me posted wrt what locking is needed in pmap? This should :allow me to keep PowerPC in sync. Don't have any plans to hit SMP on power= :pc :any time soon, but it'd be nice to have the infrastructure in place. =3D) : :--=20 :Benno Rice :benno@FreeBSD.org Sure, I'll post updates to freebsd-smp. Here's the first update: I spent a good deal of wednesday cleaning up the VM source files, breaking them up into manageable pieces and moving vm_page_zero_idle() from MD files to a new MI file. I spent about four hours experimenting with various fine-grained VM mutex models, e.g. simply by starting to code it and noting where I would bog-down. I believe I have come up with one that is useable for vm_page_t manipulation. The issue we have with vm_page_t is that various entities currently depend on the atomic_* ops or Giant to do things like lookup a page and then busy it. This previously occured under splvm() in order to guarentee that nobody else would be able to busy the page while we were trying to. Now it occurs under Giant. The goal is to be able to do these sorts of operations without Giant. This same dependance is used to do things like add or remove a vm_page_t from its page queue, and add or remove a vm_page_t from the (object, index) hash table, and move vm_page_t's between page queues. This is the solution as I envision it. It is a considerable amount of work, which I will be doing in stages. * We will have a mutex for each (PQ_XXX) page queue. The appropriate page queue mutex will be obtained to add or remove a vm_page_t to that page queue (happens a lot), and to scan the queue (contigmalloc and the pageout daemon scan the page queues). * We will have a small shared array of mutexes to lock the (objet, index) hash chains. For example, lets say you are in vm_fault and do a vm_page_lookup() to lookup a page, and not finding it you decide to vm_page_alloc() a new page. In order to protect this sequence of events vm_page_lookup() will obtain the appropriate hash chain mutex and leave it held on return (whether or not the page is found). The caller will do whatever it needs to do (non-blocking), and then release the hash chain mutex. This allows callers to safely add or remove pages from hash chains. * Many routines now lookup a page, then busy it, then release it back onto a page queue (e.g. deactivate it, free it, activate it, cache it). e.g. vm_fault, pageout daemon, and many other interactions with the system. These interactions currently operate under Giant (used to operate under spl) and do not bother to 'own' the page to execute the action. These interactions, however, do check that the page is now owned by someone else (aka that the page is not PG_BUSY or PG_BUSY/vm_page->busy). To allow callers to safely lookup and then manipulate pages, for example to manipulate vm_page->flags, I intend to change the API such that when you nominally get a page, it will be BUSY'd for yo. For example, when selecting a free or cache page from the page queues, the page would be returned already BUSY'd, allowing you to manipulate the page and then release it back to a queue (or initiate I/O, or whatever). In many cases the caller intends to busy the page anyway, so this is not much of a leap. This only works if the page is not already busy, of course, but nearly all users of the existing API skip or sleep/loop if the returned page is busy, so we can fail gracefully and allow the caller to do whatever needs to be done there. Finally we have issues with how to set PG_BUSY in the first place. Currently setting PG_BUSY uses atomic_*. It turns out that the solution is easy and does not require the use of any additional mutex operations. * When we are looking up a page that is on the free queue, aka in vm_page_alloc(), simply holding the appropriate page queue mutex (which we *ALREADY* hold in most cases) is sufficient to allow us to manipulate the free pages in that queue without worrying about other threads messing with those pages. Thus we can set PG_BUSY, remove the page from the free queue, and then release the page queue mutex before returning the newly allocated page. * When we are looking up a page that is on the cache queue, or is not associated with a queue, we simply aquire (or already hold in most cases) the appropriate hash chain mutex. Then if the page is not already PG_BUSY, we know we can safely manipulate its flags (set PG_BUSY). If the page is already PG_BUSY we need to sleep/loop anyway, so we can fail gracefully and let the parent sleep/loop/do-whatever. * We will need to find a better way to sleep/wait for a busy page to become available. The current mechanism sets a PG_WANTED flag in vm_page->flags, which doesn't work under the new scheme. I expect I will transfer this sleep/wakeup mechanism to an array of wanted flags in parallel with the VM hash chain mutex array. Ok, so what am I going to start with? Well, I'm actually going to start with #3 ... changing the VM API to return pages that are PG_BUSY'd rather then making the caller busy them, and changing the various page queue ops (e.g. vm_page_cache(), vm_page_deactivate(), etc...) to unbusy the page automatically (some like vm_page_free() already work this way). In most cases this allows existing code to operate as it used to with only minimal changes... for example, if the existing code assumes protection by Giant (original code by splvm() and Giant) in order to retrieve, manipulate, and put back a page, the new code will be able to assume protection by the fact that it will be given a PG_BUSY'd page, which it can manipulate and put back. This preliminary work can be done without introducing VM mutexes just yet (i.e. I will do this work under Giant). But once complete, this preliminary work will allow me to then add the VM mutexes described above with very little effort and take a good chunk of the VM interface out from under Giant. -- I am not going to start work on the other major interfaces... pmap, vm_object's, buffer cache, and so forth, until I complete the work on the vm_page interface. These other interfaces work on a much more granular level that will allow us to, for example, give each vm_object a mutex (something we cannot and do not want to do for each vm_page). And that is where I stand at the moment. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200107060248.f662m9w62000>