Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Apr 2001 14:35:50 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        nik@freebsd.org (Nik Clayton)
Cc:        bakul@bitblocks.com (Bakul Shah), tlambert@primenet.com (Terry Lambert), freebsd-chat@freebsd.org
Subject:   Re: Clash of Titans - Tale of two Morons
Message-ID:  <200104101435.HAA05799@usr05.primenet.com>
In-Reply-To: <20010410085110.A7075@canyon.nothing-going-on.org> from "Nik Clayton" at Apr 10, 2001 08:51:12 AM

next in thread | previous in thread | raw e-mail | index | archive | help
> > $ cd /my-sys/i386
> > Makefile        conf            ibcs2           isa             pci
> > apm             i386            include         linux           svr4
> 
> In theory, yes.  In practice, no, because no one's written the
> filesystem layer for it.
> 
> DES recently committed pseudofs, which is intended to abstract out a lot
> of the code that would be common between filesystems like this (and the
> likes of kernfs, sysctlfs, procfs, et al) so writing this sort of layer
> should now be considerably easier.  Should I ever get the requisite
> amount of free time I might tackle this example and write it up as a
> "Writing a simple filesystem for FreeBSD" article.

I wrote something similar for ZIP archives back in the FreeBSD
3.1 days; realize that my FreeBSD 3.1 was substantially different
than the one everyone else was running, since I had dealt with
the VOP_{GE|PU}TPAGES() and vm_object_t aliasing in issue (via
VOP_GET_FINALVP()) a long time ago.  Some of what made it into
5.0 recently was substantially similar to what I've been running
since about 1.5 (June 1994).

I still maintain that, no matter how you look at it, much of the
defaultfs code was a mistake.  The problem with that approach is
that it results in default valid, rather than default invalid,
ops, and that it's not extensible to anonymous VOP backing objects.

What that effectively means is that there are now versioning
problems when proxying argument descriptors between machines.
Before defaultfs, it was possible to proxy a VOP from one machine
to another through a third intermediate machine, without the
intermediate machine getting in your way.  Similarly, it was also
possible to proxy into user space and back into kernel space (if
you were to proxy the cache coherency through VOP_{GE|PU}TPAGES()
so that coherency was explicit).

Now an intermediate machine can get in your way.  The original
design was supposed to permit you to do something like mount a
DVD on one machine, pass its data through a CODEC on another,
and read a raw video stream with all frames interpolated on yet
another.

The original FICUS work had a network and a user space proxy; it
was a SunOS project, so there was no unified VM and buffer cache,
so all of the coherency points were explicit, rather than implicit.
Unfortunately, FreeBSD would have to backtrack quite a way to get
the coherency picture right these days (using OpenBSD as a living
non-unified reference would probably be the best approach).  Such
an approach is *vastly* superior to NFS (though v4 does finally
deal with the locking issue and distributed cache coherency
issue), as it would allow arbitrary VOP extension on either side
of a proxy layer.

I think providing a "pseudofs" layer which could be explicitly
referenced is a good step in the right direction _away_ from the
"defaultfs" approach; the difference being that the reference
_must_ be explicit.

There are still VOPs that really need to be "veto" based, rather
than call-through.  VOP_ADVLOCK() and VOP_ABORTOP() are extreme
examples (try porting GFS or XFS or a writeable NTFS without a
working VOP_ABORTOP() that aborts operations, instead of freeing
namei buffers not allocated by your VFS layer, and it will become
abundantly clear very quickly).

FreeBSD could also get rid of the VOP reverse lookup issue; to
do that, it would need to refactor the VOP lists each time, and
have default failure cases for new VOPs unknown to the underlying
FS.  If at the same time ir refactored (or created, in the case
of new mounts) the VOP lists, if it also sorted them into their
entry point ordering, according to the global descriptor list,
you could achieve a significant speedup in descriptor dereference,
as well as eliminate one level of indirection in the post-factored
VOP list (that is, you could make the glue code effectively "go
away" for instances of FSs resulting from mounts, even though it
would have to stick around for the mount/unmount/refactor
operations themselves).

Similarly, the NFS cookie thing that finally won out (arguements
in one order for FreeBSD, arguments in another for NetBSD and
OpenBSD, and probably Darwin) is really ill-considered.  A more
correct approach would be to break the VOP in two, where there
would be a seperate "snapshot current directory block smallest
atomic unit" and "externalize directory entries in machine
independent format" VOPs, which would let the VOP_READIR() be
restarted at an arbitrary offset, without needing to appeal to
cookies.

As it is, I think you could barely do the job in 5.0, but that it
would have taken substantial work prior to 5.0.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104101435.HAA05799>