Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Apr 2004 23:05:12 -0400
From:      Brian Fundakowski Feldman <green@FreeBSD.org>
To:        Alan Cox <alc@cs.rice.edu>
Cc:        current@FreeBSD.org
Subject:   Re: VM wiring fixed 
Message-ID:  <200404290305.i3T35CW6001661@green.homeunix.org>
In-Reply-To: Message from Alan Cox <alc@cs.rice.edu>  of "Wed, 28 Apr 2004 21:24:01 CDT." <20040429022401.GJ5199@cs.rice.edu> 

next in thread | previous in thread | raw e-mail | index | archive | help
Alan Cox <alc@cs.rice.edu> wrote:
> On Wed, Apr 28, 2004 at 08:08:17PM -0400, Brian Fundakowski Feldman wrote:
> > There are several severe wiring bugs in -CURRENT that I believe I have fixed.
> > Please test/review as appropriate if you're affected by any of them.  This 
> > is a superset of the previous patch which just mostly-fixed mlockall(2).
> > 
> > * MAP_FUTUREWIRE was not unset in vmspace_dofree(), causing the next process 
> >   to use that vmspace to wire all of its memory.
> 
> Would setting flags to 0 in vmspace_alloc() accomplish the same?

Yeah, same deal, except since we're optimizing by only clearing/
reinitializing some fields for the vmspace-zone objects, it makes a little 
more sense IMO to have it in the vmspace_dofree().

> > * kmem_*() calls either called vm_map_wire() or set MAP_ENTRY_NOFAULT, but
> >   kmem_free() did not undo the vm_map_wire() calls.
> 
> I don't understand this comment.  kmem_free() calls vm_map_remove(), which
> calls vm_map_delete(), which calls vm_map_entry_unwire(), etc.
> 
> However, I don't see any change corresponding to this statement in your
> patch.

I changed kmem_free() to do an unwire since it is the interface for 
unmapping wired memory, as opposed to kmem_free_wakeup().

> > * vm_fault_unlock() could not unwire pages that were not in the pmap already,
> >   leaking them permanently.
> 
> By definition a wired mapping is in the pmap.  A wired page can, however,
> be mapped by a non-wired mapping.  Thus, a non-wired mapping of a wired
> page could be absent from the pmap.
> 
> Problems such as PR/29915 are a result of vm_fault_unwire() not understanding
> fake pages created by the device pager.   Pmap_extract() followed by
> PHYS_TO_VM_PAGE() does not result in a pointer to the fake page that was
> wired.  Thus, the panic on unexpected wiring count.
> 
> Your patch to vm_fault_unwire() does look like a correct fix to this issue.

I didn't realize the problem was so old!

> > * vm_map_{un,}wire() did not keep track of wirings as they should.  User
> >   wirings are separate from system wirings, and there can be exactly one.
> >   There can be unlimited system wirings, but wired_count will remain at
> >   zero for map entries that are allocated as MAP_ENTRY_NOFAULT.
> 
> MAP_ENTRY_NOFAULT means that the vm_map_entry has no backing vm_object and
> that you should never fault on it, which would create a backing object.
> vm_map_{un,}wire(), however, deal with map entries that do have backing
> objects.  I'm not quite sure why you chose to introduce MAP_ENTRY_NOFAULT
> into the implementation of vm_map_{un,}wire().
> 
> I'll have to look at this more carefully.

I've done it the way I explained to preserve layering of vmspace -> map -> 
entry -> page as best I knew how.  The kmem_free() implementation doesn't 
know if the entries are MAP_ENTRY_NOFAULT or not, so I pushed the 
intelligence into the wiring routines as to what that means.

> > * vslock()/vsunlock() did not both use VM_MAP_WIRE_SYSTEM as they must;
> >   I believe this resulted in more wiring leaks.
> > * vm_map_delete() did not wait for all wirings (except one user wiring) to
> >   drain, so vslock() guaranteed nothing.
> 
> It is not vm_map_delete()'s responsibility to wait for all wirings to be
> drained.  It is only responsible for a single wired mapping. 

How can it be done, then?  The vslock() function is called with the 
expectation that the memory it references is both "wired" and "held".  I 
can't see any other to do it -- there is no refcount on entries.  Since
vslock() is the only interface for system-wiring arbitrary memory, there
will always be a matching call to vsunlock() which causes the system unwire 
to happen and the waiting vm_map_delete() to wake up.

> > * The condition in vm_fault() where all pages have been exhausted is easy to
> >   deadlock, but was impossible to recover from.  The OOM killer works only
> >   when all memory has been used, not all wired memory.  However, now it is
> >   possible to kill offending processes with SIGKILL instead of the
> >   vm_fault() in trap_pfault() looping forever.

Would you rather see this or an "OOM-wired-killer"?

> > * The init(8) program should really be using mlockall(2) so that it can kill
> >   off processes hogging all the wired memory.  However, I have not fixed this
> >   because in such case, e.g. while I would like for Ctrl+Alt+Del to work,
> >   init(8) may actually need to allocate and wire new pages itself to keep
> >   running.  I think a way to fix this is to conditionalize the
> >   vm_page_count_severe() condition on p->p_pid != 1 so that just like the
> >   REAL system processes, it can bring the page count lower than "severe".
> > 
> > I haven't tested it out on SMP yet, but on UP the latest changes don't seem 
> > to have any negative effects.  All wired memory leaks appear to be gone and
> > although init(8) probably can't do it, I can enter DDB and "kill 9 <wirehog>"
> > to take the machine out of an all-wired deadlock.  See patch at URL:
> > <http://green.homeunix.org/~green/vm-wiring.patch>;
> > 
> 
> In summary, consider this a positive review for the MAP_FUTUREWIRE and
> the vm_fault_unwire() fixes.

Thanks!

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200404290305.i3T35CW6001661>