From owner-freebsd-hackers  Thu Jan  7 18:20:15 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id SAA14365
          for freebsd-hackers-outgoing; Thu, 7 Jan 1999 18:20:15 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id SAA14354
          for <freebsd-hackers@FreeBSD.ORG>; Thu, 7 Jan 1999 18:20:12 -0800 (PST)
          (envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.1/8.9.1) id SAA36593;
	Thu, 7 Jan 1999 18:19:34 -0800 (PST)
	(envelope-from dillon)
Date: Thu, 7 Jan 1999 18:19:34 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199901080219.SAA36593@apollo.backplane.com>
To: Alfred Perlstein <bright@hotjobs.com>
Cc: Terry Lambert <tlambert@primenet.com>, dyson@iquest.net,
        pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG
Subject: Re: questions/problems with vm_fault() in Stable
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:...
:..
:[snip]
:
:MFS is just lazyness, if you want it to grow/work right, you rewrite it
:instead of hacking FFS on top of it.
:
:or you design it in such a way that it's a device that FFS is somewhat
:aware of.  this way when a block is asked to be filled what really happens
:is that the block passed in is put on the free block list and FFS is given
:a page of the MFS to use, when FFS pushes the block back to MFS the
:replaced page is put back under the vnode.
:
:MFS then becomes a device instead of a filesystem.
:
:although i think it violates some abstraction, does this make sense?
:
:-Alfred

    This actually does make some sense.  What you are basically saying
    is that it should be possible for the MFS device to rename a 
    VM page cached at a lower layer (mfsobj,page#) to a higher layer
    (ffs_sub_object,page#).

    This isn't possible with the current VFS layering.  That is, the 
    current VFS layering will pass a KVA mapped buffer down, but it
    does not expect the lower layer to physically replace the pages
    associated with the buffer with its own pages.  Also, while the
    clean/dirty state of the page could be retained, the relationship
    with the lower layer's page's backing store would be lost when it
    renames the page ( backing store works differently depend on the
    type of object and cannot be transported across VFS layers ).  The
    page, clean, dirty, or TBD (to be destroyed) state would have to 
    eventually be passed back down to the lower layer when the upper layer
    is done with it... an extremely dangerous proposition.

    Implementing a vm_alias would solve half the problem - the lower layer
    would no longer have to 'loose' its reference to the page, and the
    upper layer can manipulate the pages in its own object space without
    having to worry about odd interactions with other layers.  If the 
    VFS/BIO system were then changed to *NOT* pass KVA buffers down but
    instead work solely with bp->b_pages[] arrays, then the upper layer 
    could theoretically instantiate vm_alias's in the array that are
    initially not associated with any real VM page and pass that down to
    the lower layer.  The lower layer could then simply [re]link the 
    vm_alias's into the proper VM page chains, allocating new physical
    pages as necessary.

    If we were to do that, then we would have about 70% of the cache 
    coherency problem solved too - 90% if we discount crossing a network.
    If the vm_aliases teardown always occurs from the top-down (either by
    the devices or by the vm_pageout process), pages passed back and 
    forth in this manner would be cache coherent within any given machine.
    The exceptions would be, mainly, file fragments less then a page in
    size.  Each alias would be able to maintain its own clean/dirty state
    to optimize teardown operations ( there would also be a general dirty 
    state in the root vm_page ).

    So to round out the solution a two-way cache coherency protocol is 
    required on top of the vm_aliasing.  The protocol is necessary to handle
    both special cases like file fragments, and changes in coherency that 
    propogate from the bottom-up ( for example, if some other host modifies
    the same file on the NFS server that you are messing with ).  If we make
    this protocol slightly more complex, it could be made to work over a
    network hop as well as internally. 

    This is effectively what John and I are putting forth as a solution 
    (though we are debating other ways of doing the equivalent function 
    of 'vm_alias'). 


						-Matt

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message