From owner-freebsd-hackers Thu Jan 7 23:57:16 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id XAA18647 for freebsd-hackers-outgoing; Thu, 7 Jan 1999 23:57:16 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id XAA18639 for ; Thu, 7 Jan 1999 23:57:14 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.1/8.9.1) id XAA38476; Thu, 7 Jan 1999 23:56:44 -0800 (PST) (envelope-from dillon) Date: Thu, 7 Jan 1999 23:56:44 -0800 (PST) From: Matthew Dillon Message-Id: <199901080756.XAA38476@apollo.backplane.com> To: Alfred Perlstein Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: (mfs idea) Re: questions/problems with vm_fault() in Stable References: Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :When a block device fills in a buffer and it is then written to causing it :to become dirty, doesn't it have to be flushed to the same device it was :fetched from? : :Since MFS is a pseudo device _and_ a filesystem, won't all dirty pages :actually be handed back eventually to the device portion of the code? :Doing a lookup can determine if the buffer is actually part of the newfs :process, or just a formulated write... (this would be a memory to memory :copy, or MFS could steal this buffer and traverse the LRU to force out a :buffer to return.) : :You are saying that the dirty buffer can be lost to the MFS, if this is :possible, then how do the other filesystems maintain stability if they :can possibly never be flushed out? No, it doesn't work that way. Filling the buffer does not necessary make the pages associated with the buffer dirty. They are... but only temporarily. The caller marks them clean after the I/O is complete because, really, that is what they are. Ultimately, 'dirty' is an indication that a writeback is required. When a filesystem issues a read from a lower level device, those blocks are NOT dirty because the filesystem can throw them away at any time and re-read them again. So when the filesystem issues the read, it clears the dirty bit on the page on completion of the read. Thus if the dirty bit is set later it means that the filesystem *modified* something and that a writeback is required. Otherwise it wouldn't know. :since MFS is a bottom layer, the pages can be moved around as much as you :want, with the stipulation that since it's dirty it always must be sent MFS is not the bottom layer. Swap is the bottom layer. Or a another file (and then even it may not be the bottom layer). You can use a file as backing store for an MFS filesystem. It doesn't have to be swap. Swap itself might be mounted on a VN device and also not be the bottom layer. The problem with all these layers is that you wind up doing actual data copying to get pages between them with VOP calls unless you can forward the VOP_STRATEGY call (which is what the vn device does)... but MFS can't do this, there is nowhere to forward the call to. If the point of the exercise is to avoid that and to pass the page directly up from a lower layer to a higher layer, you can't just play tricks with the dirty bits - you will lose state. You have to tag the page with the original information before you 'rename' it up to a higher level, so at some point in the future someone trying to free the page will notice the tag and know to call back down. :i think it's more of a hack in the MFS than the general filesystem code, :all that has to be done is that the buffers (which are really just pages :of the mfs allocation) are always marked dirty so they make their way back :to the storage (the address space of the mfs entity) No, it isn't so simple. The only thing MFS owns is its address space. The buffers passed to it in an I/O request are *NOT* owned by MFS and are *NOT* part of it's 'disk's address space. For any given I/O, there are *TWO* pages involved... the page passed to MFS in the I/O request, and the page representing the MFS disk image that MFS owns. At the moment, MFS physically copies data from one page to the other. This maintains the clean/dirty status of the MFS disk page and allows the caller to maintain the clean/dirty status of the page it passes to MFS independantly (and the caller always marks it's own pages clean after issuing the read). If the purpose of the exercise is to avoid the data copy, MFS would have to replace the caller's page with its own. The only way to do that is for MFS to *rename* the page -- remove it from MFS's VM object and add it to the callers while at the same time throwing away the page supplied by the caller. At that point, the page is no longer owned by MFS and the caller can do anything it likes with it, including destroying the previous clean/dirty state of the page. If we force the caller to mark the page dirty, we can succeed in getting it to commit the page back to MFS some time later but at a huge cost - because MFS would not be able to tell whether the page actually needs to be written back to swap or not. Reading a file would result in dirtying pages and re-writing swap, which is not desireable. :A _real_ MFS is always good, one that has a maximum size, and doesn't need :to satisfy the FFS's optimizations for mass storage and does zero copy. : :I was just wondering if it's possible to do this as a temporary :improvement? No. Sorry. :I really need to understand and read the code more, consider me off your :back for the time being. :) : :thanks, :-Alfred Suggested reading: /usr/src/sys/ufs/mfs/*.c /usr/src/sys/vm/vnode_pager.c ( read vnode_pager_generic_getpages() ) -Matt Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message