Date: Tue, 10 Apr 2001 14:35:50 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: nik@freebsd.org (Nik Clayton) Cc: bakul@bitblocks.com (Bakul Shah), tlambert@primenet.com (Terry Lambert), freebsd-chat@freebsd.org Subject: Re: Clash of Titans - Tale of two Morons Message-ID: <200104101435.HAA05799@usr05.primenet.com> In-Reply-To: <20010410085110.A7075@canyon.nothing-going-on.org> from "Nik Clayton" at Apr 10, 2001 08:51:12 AM
next in thread | previous in thread | raw e-mail | index | archive | help
> > $ cd /my-sys/i386 > > Makefile conf ibcs2 isa pci > > apm i386 include linux svr4 > > In theory, yes. In practice, no, because no one's written the > filesystem layer for it. > > DES recently committed pseudofs, which is intended to abstract out a lot > of the code that would be common between filesystems like this (and the > likes of kernfs, sysctlfs, procfs, et al) so writing this sort of layer > should now be considerably easier. Should I ever get the requisite > amount of free time I might tackle this example and write it up as a > "Writing a simple filesystem for FreeBSD" article. I wrote something similar for ZIP archives back in the FreeBSD 3.1 days; realize that my FreeBSD 3.1 was substantially different than the one everyone else was running, since I had dealt with the VOP_{GE|PU}TPAGES() and vm_object_t aliasing in issue (via VOP_GET_FINALVP()) a long time ago. Some of what made it into 5.0 recently was substantially similar to what I've been running since about 1.5 (June 1994). I still maintain that, no matter how you look at it, much of the defaultfs code was a mistake. The problem with that approach is that it results in default valid, rather than default invalid, ops, and that it's not extensible to anonymous VOP backing objects. What that effectively means is that there are now versioning problems when proxying argument descriptors between machines. Before defaultfs, it was possible to proxy a VOP from one machine to another through a third intermediate machine, without the intermediate machine getting in your way. Similarly, it was also possible to proxy into user space and back into kernel space (if you were to proxy the cache coherency through VOP_{GE|PU}TPAGES() so that coherency was explicit). Now an intermediate machine can get in your way. The original design was supposed to permit you to do something like mount a DVD on one machine, pass its data through a CODEC on another, and read a raw video stream with all frames interpolated on yet another. The original FICUS work had a network and a user space proxy; it was a SunOS project, so there was no unified VM and buffer cache, so all of the coherency points were explicit, rather than implicit. Unfortunately, FreeBSD would have to backtrack quite a way to get the coherency picture right these days (using OpenBSD as a living non-unified reference would probably be the best approach). Such an approach is *vastly* superior to NFS (though v4 does finally deal with the locking issue and distributed cache coherency issue), as it would allow arbitrary VOP extension on either side of a proxy layer. I think providing a "pseudofs" layer which could be explicitly referenced is a good step in the right direction _away_ from the "defaultfs" approach; the difference being that the reference _must_ be explicit. There are still VOPs that really need to be "veto" based, rather than call-through. VOP_ADVLOCK() and VOP_ABORTOP() are extreme examples (try porting GFS or XFS or a writeable NTFS without a working VOP_ABORTOP() that aborts operations, instead of freeing namei buffers not allocated by your VFS layer, and it will become abundantly clear very quickly). FreeBSD could also get rid of the VOP reverse lookup issue; to do that, it would need to refactor the VOP lists each time, and have default failure cases for new VOPs unknown to the underlying FS. If at the same time ir refactored (or created, in the case of new mounts) the VOP lists, if it also sorted them into their entry point ordering, according to the global descriptor list, you could achieve a significant speedup in descriptor dereference, as well as eliminate one level of indirection in the post-factored VOP list (that is, you could make the glue code effectively "go away" for instances of FSs resulting from mounts, even though it would have to stick around for the mount/unmount/refactor operations themselves). Similarly, the NFS cookie thing that finally won out (arguements in one order for FreeBSD, arguments in another for NetBSD and OpenBSD, and probably Darwin) is really ill-considered. A more correct approach would be to break the VOP in two, where there would be a seperate "snapshot current directory block smallest atomic unit" and "externalize directory entries in machine independent format" VOPs, which would let the VOP_READIR() be restarted at an arbitrary offset, without needing to appeal to cookies. As it is, I think you could barely do the job in 5.0, but that it would have taken substantial work prior to 5.0. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104101435.HAA05799>