From owner-freebsd-hackers  Thu Jan  7 23:57:16 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id XAA18647
          for freebsd-hackers-outgoing; Thu, 7 Jan 1999 23:57:16 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id XAA18639
          for <freebsd-hackers@FreeBSD.ORG>; Thu, 7 Jan 1999 23:57:14 -0800 (PST)
          (envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.1/8.9.1) id XAA38476;
	Thu, 7 Jan 1999 23:56:44 -0800 (PST)
	(envelope-from dillon)
Date: Thu, 7 Jan 1999 23:56:44 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199901080756.XAA38476@apollo.backplane.com>
To: Alfred Perlstein <bright@hotjobs.com>
Cc: freebsd-hackers@FreeBSD.ORG
Subject: Re: (mfs idea) Re: questions/problems with vm_fault() in Stable
References:  <Pine.BSF.4.05.9901080123210.37756-100000@bright.fx.genx.net>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:When a block device fills in a buffer and it is then written to causing it
:to become dirty, doesn't it have to be flushed to the same device it was
:fetched from?
:
:Since MFS is a pseudo device _and_ a filesystem, won't all dirty pages
:actually be handed back eventually to the device portion of the code?
:Doing a lookup can determine if the buffer is actually part of the newfs
:process, or just a formulated write... (this would be a memory to memory
:copy, or MFS could steal this buffer and traverse the LRU to force out a
:buffer to return.)
:
:You are saying that the dirty buffer can be lost to the MFS, if this is
:possible, then how do the other filesystems maintain stability if they
:can possibly never be flushed out?
    
    No, it doesn't work that way.  Filling the buffer does not necessary
    make the pages associated with the buffer dirty.  They are... but only
    temporarily.  The caller marks them clean after the I/O is complete
    because, really, that is what they are.

    Ultimately, 'dirty' is an indication that a writeback is required.  When
    a filesystem issues a read from a lower level device, those blocks are
    NOT dirty because the filesystem can throw them away at any time and
    re-read them again.  So when the filesystem issues the read, it clears the
    dirty bit on the page on completion of the read.  Thus if the dirty bit 
    is set later it means that the filesystem *modified* something and 
    that a writeback is required.  Otherwise it wouldn't know.

:since MFS is a bottom layer, the pages can be moved around as much as you
:want, with the stipulation that since it's dirty it always must be sent

    MFS is not the bottom layer.  Swap is the bottom layer.  Or a another file
    (and then even it may not be the bottom layer).  You can use a file
    as backing store for an MFS filesystem.  It doesn't have to be swap.
    Swap itself might be mounted on a VN device and also not be the bottom
    layer.

    The problem with all these layers is that you wind up doing actual data
    copying to get pages between them with VOP calls unless you can forward
    the VOP_STRATEGY call (which is what the vn device does)... but MFS can't
    do this, there is nowhere to forward the call to.

    If the point of the exercise is to avoid that and to pass the page directly
    up from a lower layer to a higher layer, you can't just play tricks with
    the dirty bits - you will lose state.  You have to tag the page with the
    original information before you 'rename' it up to a higher level, so at
    some point in the future someone trying to free the page will notice the
    tag and know to call back down.

:i think it's more of a hack in the MFS than the general filesystem code,
:all that has to be done is that the buffers (which are really just pages
:of the mfs allocation) are always marked dirty so they make their way back
:to the storage (the address space of the mfs entity)

    No, it isn't so simple.  The only thing MFS owns is its address space.
    The buffers passed to it in an I/O request are *NOT* owned by MFS and
    are *NOT* part of it's 'disk's  address space.

    For any given I/O, there are *TWO* pages involved... the page passed to
    MFS in the I/O request, and the page representing the MFS disk image that
    MFS owns.

    At the moment, MFS physically copies data from one page to the other.  This
    maintains the clean/dirty status of the MFS disk page and allows the caller
    to maintain the clean/dirty status of the page it passes to MFS 
    independantly (and the caller always marks it's own pages clean after
    issuing the read).

    If the purpose of the exercise is to avoid the data copy, MFS would have
    to replace the caller's page with its own.  The only way to do that is
    for MFS to *rename* the page -- remove it from MFS's VM object and add it
    to the callers while at the same time throwing away the page supplied
    by the caller.  At that point, the page is no longer owned by MFS and
    the caller can do anything it likes with it, including destroying the
    previous clean/dirty state of the page.

    If we force the caller to mark the page dirty, we can succeed in getting 
    it to commit the page back to MFS some time later but at a huge cost -
    because MFS would not be able to tell whether the page actually needs to
    be written back to swap or not.  Reading a file would result in dirtying
    pages and re-writing swap, which is not desireable.

:A _real_ MFS is always good, one that has a maximum size, and doesn't need
:to satisfy the FFS's optimizations for mass storage and does zero copy.
:
:I was just wondering if it's possible to do this as a temporary
:improvement?

    No.  Sorry.

:I really need to understand and read the code more, consider me off your
:back for the time being. :)
:
:thanks,
:-Alfred

    Suggested reading:

	/usr/src/sys/ufs/mfs/*.c
	/usr/src/sys/vm/vnode_pager.c	( read vnode_pager_generic_getpages() )

						-Matt

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message