Date: Thu, 7 Jan 1999 19:20:54 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Alfred Perlstein <bright@hotjobs.com> Cc: Terry Lambert <tlambert@primenet.com>, dyson@iquest.net, pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG Subject: Re: questions/problems with vm_fault() in Stable Message-ID: <199901080320.TAA36935@apollo.backplane.com>
next in thread | raw e-mail | index | archive | help
:um, this doesn't give us a growing MFS and i may be niave about this but
:consider:
:
:FFS requests a block passing in a buffer, the buffer is switched out from
:under the attached vnode and stored in the free list (or possibly the
:'locked' list), a buffer from the Memory device is put in its place and
:the vnode is marked as such.
:
:FFS has the block for a while.
:
:Eventually the block is taken off the LRU list because of the nature of
:the LRU queue, anything removing a block must check the mark to see if
:it's MFS backed.
Which buffer? The one MFS passed back or the original one that was
replaced? I assume you mean that the original buffer is freed and
we are now talking about the one MFS passed back, currently under
control of FFS, is no longer being used and eventually is ready
to be freed again.
:In fact this could be a callback function, called in general when ANY
:buffers are reused allowing for other flexibilities, however, this
:callback may already be in place as the "flush to backing store"
:call that's done for traditional devices under FFS.
In order for the callback to work, especially if you intend this
mechanism to work across VFS layers, the original 'source' of the
buffer must be recorded in the vm_page_t. Otherwise the callback
doesn't know who to call.
:At this point a buffer must be reattached to this vnode, it can be brought
:over from the free list, or perhaps the original buffer could have been
:placed on the 'locked' list (is this still around?)
Anything put on a 'free' list is gone. Or you are mis-defining the
function of the 'free' list... it isn't really a free list.
The problem with renaming a page isn't with the page being ripped out
from the upper VFS layer, but the fact that the lower VFS layer is
removing the page from its own map and thus 'looses' track of it -
something a vm_alias would solve neatly.
-Matt
:Maybe this is impossible with what we have, or doesn't make sense sorry.
:I _really_ need to UTSL more. :)
:
:-Alfred
:
:btw, what you and John Dyson are working on sounds trully awesome. I
:really hope you guys consider publishing a paper on it sometime because at
:that point FreeBSD will have moved so far from 4.4BSD, that the reference
:books become very far removed from the actual underlying system.
That is an excellent idea. The VFS stuff we are flame festing over now
is not something under immediate development... I'm spending the next
3 months cleaning up the existing VM system first, but something is
going to happen at some point down the line. The current situation
cannot be scaled or extended easily and it is also way too easy for
programmers to make mistakes -- VOP_GETPAGES and VOP_PUTPAGES alone have
so many assumed side effects (as to how objects and pages are locked
and what state they should be in on return) that it's a wonder there
aren't more bugs.
I could hack in vm_alias's in about two days, but that doesn't mean I
should. I figure by the time I'm done fixing the VM system, the proper
course of action to take in regards to VFS/BIO will be more apparent.
( By 'fixing' I mean mainly removing low memory deadlocks, low memory
special cases, and removing cross dependancy 'bypasses' and special
cases from the vm_pager and vm_object APIs ).
I'll include the README that iterates what has been fixed so far
(and will be committed on the 15th or 16th after the tree is split)
-Matt
Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet
Communications & God knows what else.
<dillon@backplane.com> (Please include original email in any response)
* Complete replacement of swap pager (vm/swap_pager.c)
The swap pager has been completely replaced. The new pager uses the
new blist bitmap allocator and is able to allocate and deallocate swap
from its bitmap without blocking anywhere. Additionally, the new pager
is able to avoid memory deadlock situations and as a consequence we
have simplified a number of other areas of the VM system.
Also vm/vm_swap.c was changed... the swap device block size is now
PAGE_SIZE'd. This simplifies code throughout both modules.
* Addition of bitmap management module, kern/subr_blist.c
Used by the swap system. (the old rlist module has been depreciated).
Could be used for other things.
* ripped out vm_object_t->paging_offset
This field was hacked in all over the source to optimize out a
single swap_pager_copy() command in vm/vm_map.c. I've ripped out
the optimization because it really doesn't improve performance
with the new swapper.
* added vm_page_t->swapblk
The swapblk for resident pages is stored in the vm_page_t rather then
in swap metadata. This field can also be used by other pagers, not
just the swap pager.
* removed low-memory checks in a couple of places
There are a few places, such as in vm/vm_fault.c, where the system
will stall a process if memory is low. The problem is that if you
have a memory-hogging process this tends to lock up all other
processes, making it impossible to login to the machine for fork/exec
new programs. The result is an effective lockup.
* getpbuf()/relpbuf() - added subsystem limits
A new argument has been added, a pointer to an integer counter which
is decrmented on getpbuf() and incremented on relpbuf(). getpbuf()
will block if the counter is 0. This is on top of blocking when the
global buffer pool is exhausted.
This feature is required to prevent any one subsystem from hogging
pbuf's, which can lockup the machine in a low-memory situation
(or lockup the machine, period).
* Fixed madvise().
madvise() was badly broken, but people didn't notice
it because it wasn't actually trying to free pages immediately so
processes had a chance to recover from its mistakes.
At the moment madvise() really tries to free the page, but we will
probably back off and just clean the page and move it into the cache
after testing is complete.
* Major revamping of vm/vm_pageout.c
Fixed a number of blocking and deadlock situations in pageout.c,
mainly relate to the swapper and to the vnode pager.
* Major revamping of vm/vm_page.c
vm_page_free() has been revamped along with a bunch of other routines.
Also, added pager callbacks vm_pager_page_inserted() and
vm_pager_page_removed() and shoehorned them into vm_page_insert()
and vm_page_free() and such. vm_page_remove()'s functionality has
changed and it is now a static function.
vm_page_alloc() has been revamped. Removed unnecessary inlining of
code. We now formally free cache pages before reusing them (also
necessary since the mechanism of freeing a page has changed).
Added vm_await() and vm_page_asleep() functions - will be used later.
* Major revamping of MFS filesystem code.
Now supports VOP_FREEBLKS and handles low-memory conditions better
as a side effect of changes made elsewhere. Also added protection
of MFS queue at splbio().
* Added device-block-to-page-block and page-block-to-device-block
conversions to sys/param.h
* Added u_daddr_t to sys/types.h - unsigned version of daddr_t (used by
new swap code)
* Greatly simplified vm_object_t's swap-related fields, making the
structure a little smaller.
* Simplified vm_page_t->hashq. Changed the doubly-linked list to a singly
linked list, doubled the size of the hash table ( without doubling
the storage), and this change also simplifies a bunch of critical path
code.
* Removed vm_object_t->page_hint. It was slowing things down instead of
speeding things up.
* Inlined a number of critical vm_pager routines.
----- OTHER CHANGES -----
* Added M_ASLEEP functionality to malloc (this is for later)
* Changed malloc flag M_KERNEL to M_USE_RESERVE
* Fixed uipc_usrreq.c sorflush() call to make sure it's a socket - it
might not be.
* vm_meter does not count device objects (such as /dev/mem), because
these really skew the results and make vmstat less useful.
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901080320.TAA36935>
