From owner-freebsd-hackers Wed Jan 6 19:16:07 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id TAA21549 for freebsd-hackers-outgoing; Wed, 6 Jan 1999 19:16:07 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id TAA21523 for ; Wed, 6 Jan 1999 19:16:03 -0800 (PST) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id UAA05128; Wed, 6 Jan 1999 20:15:34 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp02.primenet.com, id smtpd005075; Wed Jan 6 20:15:24 1999 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id UAA10543; Wed, 6 Jan 1999 20:15:22 -0700 (MST) From: Terry Lambert Message-Id: <199901070315.UAA10543@usr09.primenet.com> Subject: Re: questions/problems with vm_fault() in Stable To: dillon@apollo.backplane.com (Matthew Dillon) Date: Thu, 7 Jan 1999 03:15:21 +0000 (GMT) Cc: tlambert@primenet.com, dyson@iquest.net, pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG In-Reply-To: <199901062259.OAA25909@apollo.backplane.com> from "Matthew Dillon" at Jan 6, 99 02:59:19 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I think you have to get away from thinking about 'consumers' and > 'providers'. It is precisely this sort of thinking that screwed > up the existing VFS design. > > The best way to abstract a VFS layer is consider that each VFS layer > has a 'frontside' and 'backside'. I think you are confusing the definitioanl aspects of "fronside" and "backside"; the point of specifying a "consumer" at all is to define the interface on the top of the module at the top of the stack or the interface on the bottom of the module at the bottom of the stack. These particular modules are singularly uninteresting, as far as their ability to act as anything other than pigs, where "some pigs are more equal than others". They contribute relatively little to the game, other than acting as "stream head" or "stream tail" for the interesting parts of the stack. And of course as a living history of how the architecture was wedged in wrong in the first place, as you look through the various usages of "struct fileops" in the kernel: the pipe code, the socket code, and the vnops code. Why aren't pipes and sockets vnodes so that the file access interface can be normallized? Why can't I cal fcntl() on to use the F_GETOWN/F_SETOWN interfaces on a FIFO? Brain damage. > The VFS layer should make no > assumptions whatsoever as to who attaches to it on the frontside, > and who it is attached to on the backside. Fine and dandy, if you can tell me the answers to the following questions: 1) The system call layer makes VFS calls. How can I stack a VFS *on top of* the system call layer? 2) The NFS server VFS makes RPC calls. How can I stack a VFS *under* the NFS server VFS? The problem exists in streams as well. Somewhere, there has to be a stream head. And on the other end, somewhere there has to be a driver. > If you really want, you could consider a 'consumer' to be the VFS > layer's backside and a 'provider' to be the SAME VFS layer's frontside. > So a VFS layer's backside 'consumer' is linked to another VFS layer's > frontside 'provider'. And so forth. But don't try to 'type' a VFS > layer -- it doesn't work. It was precisely that sort of thinking > that required something like the MFS filesystem, which blurs distinctions, > to be a major hack in existing kernels. I'm not trying to 'type' a VFS layer. The problem is that some idiot (who was right) thought it's be faster to implement block access in FS's that need block access, instead of creating a generic "stream tail" that implemented the buffer cache interface. If they had done that, then the VOP_GETPAGES/VOP_PUTPAGES would directly access the VOP_GETBLOCKRANGE/VOP_PUTBLOCKRANGE of the underling tail, and FFS could stack on top of it, and "stack" on top of other FS's (although it would only use a subset of the operations, which would pretty much result in it doing the same thing as if it wasn't stacked, unless you implemented VOP_GETBLOCKRANGE/VOP_PUTBLOCKRANGE, for example to implement "vinum" as a stacking layer). > The only way to do cache coherency through a multi-layered VFS design > is to extend the vm_object model. You *cannot* require that a VM > system use VOP_GETPAGES or VOP_PUTPAGES whenever it wants to verify > the validity of a page it already has in the cache. If a page is sitting > in the cache accessible to someone, that someone should be able to use > the page immediately. This is why a two-way cache coherency protocol > is so necessary, so things that effect coherency can be propogated > back up through the layers rather then through hacks. Requiring the > GET/PUTPAGES interface to be used in a cache case destroys the efficiency > of the cache and, also, makes it virtually impossible to implement async > I/O. The VFS layer, as it stands, cannot do async I/O - the struct buf > mechanisms 'sorta' does it, but it isn't really async due to the huge > number of places where the system can block even before it returns a bp. OK. You are considering the case where I have two vnodes pointing to the same page, and I invalidate the page in the underlying vnode, and asking "how do I make the reference in the upper vnode go away?", right? The way you "make the reference in the upper vnode go away" is by not putting a blessed reference there in the first place. Problem solved. No coherency problem because the problem page is not cached in two places. The page's validity is known by whether or not it's valid bit is set. What you *do* have to do is go through the routines for VOP_GETPAGES/VOP_PUTPAGES if you want to change the status of a page that you are addressing via a vnode reference, through one or more stacking layers which may choose to translate that reference. More formally, you can't make a page accessed this way appear without doing a VOP_GETPAGES or disappear without a VOP_PUTPAGES. And that's the purpose in life of the vnode pager. > An extended vm_object and cache coherency model would, for example, > allow something like MFS, VN, or VINUM to be implemented almost trivially > and definitely more efficiently, even unto having filesystems relocate > underlying storage on the fly. You could implement these things rather trivially as it is, if the bottom end VFS was a variable granularity block store instead of a "file system" that managed its blocks directly, with the caveat that stacking something that managed block layout on anything other than a variable granularity block store layer would be pretty darn useless, since it would never invoke an inferior VOP that implemented policy. Of course, you're aiming at your foot if you do this. Consider an FS that implements ACL's via a VOP_ACL, and manages its own block layout, and then stack something like FFS (that *doesn't* implement a VOP_ACL) on top of that. Now call a system call that calls VOP_ACL, and watch it spam your FFS contents under you, as it acts unexpectedly. If you insist on seperating the block management into a stacking layer, then you will *at least* have to 'type' the stacking layers to avoid stacking a block-management-only consumer on top of another similar consumer to avoid direct block manipulation by an otherwise unprotected call. I think this would be a bad precedent, though I do like the idea of the buffer cache interface being represented as a variable granularity block store. But then, that's what devices are for. Say you don't buy this argument. OK, then what VFS does the NFS client VFS stacking layer stack on top of? It doesn't stack on top of the buffer cache. You're stuck implementing all of the service interfaces in the entire system as VOP's. Not a nice thing. Now the "head" is another intersting issue. In streams, the head is exported as a device. But in VFS stacking, the "head" is implicitly abstracted via system calls. This isn't really a bad thing, but it allows kernel engineers to do stupid things, like treating system calls that consume the VFS interface as if they were somehow something special, compared to an NFS server or a SAMBA server or an AppleTalk server, or some VFS stacking layer that consumes a VFS interface. In general, I have to say that I think you are setting yourself up for some hairy problems; at some point, you will have to make a design compromise, and if you don't go into it with this idea in your head in the first place, it's going to be a surprise instead of something you planned. Probably a nasty surprise. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message