Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Jan 1999 15:16:26 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Don Lewis <Don.Lewis@tsc.tdk.com>
Cc:        dyson@iquest.net, tlambert@primenet.com (Terry Lambert), pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG
Subject:   Re: questions/problems with vm_fault() in Stable
Message-ID:  <199901062316.PAA26021@apollo.backplane.com>

next in thread | raw e-mail | index | archive | help
:} Any of the problems with the existing VFS/VM scheme have been with the
:} intricacies of dealing with VFS special cases, and dealing with the
:} I/O abstraction of buffers as a cache.  Forget "files" and think
:} "blobs of memory."  Once the notion of file is forgotten, then shadowing,
:} invalidation and aliasing of memory become very obvious...
:
:One complication that comes to mind is /dev/vn*.  There's a blob of
:memory associated with the file that's attached to this device.  If
:you create a filesystem on this device and mount it, then each of
:the files in that filesystem will also have an associated blob of
:memory and these memory blobs are subsets of the big blob.  Of course
:you could do something really crazy and use something like ccd to
:stripe a couple of /dev/vn* devices together and use the result as
:a filesystem ...
:
:Maybe the thing to do is to turn all the filesystems into stacking
:layers, including ffs.

    This isn't a complication at all.  It is, in fact, exactly what extending
    the vm_object model and implementing a cache coherency model would fix.
    It's trivial to create associational relationships between vm_page's and
    vm_object's.

    Let me put forth an example of one way to do this.  It by no means the
    *only* way, but it's the one I've been thinking about the most.

    Currently, vm_object's are multi-layered - it's the multi-layering that
    allows you to fork(), to map things MAP_PRIVATE, and so forth.  This
    model allows swap backing store to be shared across a fork() until one or
    the other process tries to the modify a page.  Without it, the per-process
    memory utilization would be horrendous especially considering the number
    of modifications made to fixup shared libraries.  

    We could, in fact, just use the vm_object model to implement stackings
    such as filesystems mapped on top of VN devices mapped on top of CCD
    partitions, and so forth.  It would be extremely inefficient, but it
    *could* be used that way.

    But we want to be efficient.  The vm_object model layering is not 
    designed to operate on a page-by-page basis.  Instead it is designed to 
    operate on larger page ranges within objects.

    So, what to do:  Well, rather then link vm_object's together directly,
    we could implement vm_page aliases to allow a physical page to be mapped
    in more then one object at a time.  This would operate strictly as a cache
    in order to avoid the equivalent of VOP_GETPAGES.  The aliases associated
    with a page are chained together.  This would allow a VFS layer to layer
    two whole vm_object's on top of each other (the caller's object and the
    VFS layer's object) without piecemealing the subobject.  The VFS layer(s)
    then build up chains of VM aliases.

    This would work well because the VFS layer would have complete control
    over its own chain link and thus be able to break it (invalidate), 
    reattach it (reallocate underlying blocks), or pass it along in information
    sent via the vm_object (use as an argument to a cache coherency protocol).
    These operations would run entirely through the VM and cache coherency
    protocols/call-APIs and, in my view, would be extremely optimizeable.

    Async I/O also becomes both possible and trivial, because a VFS layer can
    use a vm_alias to placemark a critical-path read or write I/O without 
    necessarily having a real physical page to work with, and use the
    structure in its pass-through to a lower layer.  Rather then depend on
    the lower layer to instantiate the controlling structures for the I/O
    (which is one of the areas which programmers screw up the most in the
    current VFS system), the controlling structure is instantiated in the
    VM/VFS layer making the request.

    In the absolute worst case, a vm_alias mechanism would *only* be used
    to placemark I/O requests.  It would not be much more efficient then
    the existing VOP_GETPAGES model.  In the typical case, there would be
    sufficient vm_alias's in the pool to keep a hold of chains of
    VM dependancies and allow the VM system to completely bypass (or at
    least greatly reduce the cost of) the VOP model for the cache case.

						    -Matt

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901062316.PAA26021>