Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jan 1997 18:57:05 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        dg@root.com
Cc:        terry@lambert.org, michaelh@cet.co.jp, bde@freefall.freebsd.org, Hackers@freebsd.org
Subject:   Re: cvs commit: src/sys/kern kern_lockf.c
Message-ID:  <199701270157.SAA02602@phaeton.artisoft.com>
In-Reply-To: <199701262156.NAA08258@root.com> from "David Greenman" at Jan 26, 97 01:56:14 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> >This is one of the things I am always on about.
> >
> >
> >The call trace should be:
> >
> >	fcntl(lock)				<- check call syntax here
> >		lf_advlock(lock)		<- check arg values here
> >		if( !VOP_ADVLOCK(lock))
> >			lf_advlock(unlock)
> >
> >And the FS specific VOP_ADVLOCK should simply return 0 in all cases for
> >most FS's.
> 
>    I disagree. The call trace should be:
> 
> 	fcntl(lock)
> 		VOP_ADVLOCK(lock)
> 			lf_advlock(lock)
> 
>    This works properly with unusual filesystem stacking and is more flexible.

???

How so?  Can you give me a stacking situation example where this would
be true?  The Heidemann paper specifically references null function bodies
in a layering design, and the Rosental paper specifically talks about
"collapsing" call graphs for null function elements in layers.  I
can think of several layers where you would want to affect the data
operands, but not the hierarchy operands: an encryption layer, etc..
I can also think of several situations where you would want ti affect the
hiearchy operands instead of the data operands: a quota layer, an ACL
layer, a UMSDOS attribution layer, etc..

I can think of no situation in which I would want to hit the wire for
a network FS call, when the given operation will fail remotely and
will also fail locally (ie: VOP_ADVLOCK).  All you succeed in doing
is increasing latency, for no good reason.

We can discuss the race conditions using this same call-down type of
implementation in VOP_LOCK for directory traversal in an MSDOSFS,
and the possibility for error when each FS implementor is required to
reimplement upcalls (violating the abstract interface definition)
in an identical way.  These are obvious, and the related PR's are
long-standing.

Further, we can identify issues of NFS export resulting from the same
style of coding in the mount calls being expected to process the
exposed mount points, and of root vs. non-root FS mouninting being
valid in some FS's and not in others because of the absurd per FS
reimplementation requirements.


If even we accepted you as being correct, we could implement the
toplogically equivalent arrangement in a veto architecture by mounting
an "advisory locking implementation layer" immediately above the
terminal layer, and have it make the lf_advlock calls instead.


Finally, this neglects fan-out architectures in which there may be
a pseudo-vnode as a container object for more than one underlying
vnode: it is necessary to lock the container object as well, since
a user can legally have a "view" onto any FS "root" in any stack
of FS layers (indeed, this *must* happen for most of the existing FS's
NFS export implementations).


>    Only leaf filesystems should call lf_advlock(), so upper layers don't
> matter. union_advlock should just be a pass-through.

This assumes that a leaf element is rigidly defined (ie: it assumes that
the bottom end of the stack will directly access media via a system
specific mechanism for doing raw I/O).

In the Rosenthal paper, a design is discussed where the bottom end
is the same interface as the top end, even for nodes that structure
storage.  You can think of the bottom ends layer being a layer with
a flat name space (not imposing directory hierarchy) and the namespace
as being numeric (this is one of the problems with the current "FS
responsible for namei buffer deallocation" scheme -- it implies a buffer
as opposed to some otherwise opaque layer specific statite).


In reality, the bottom end system interface wants to be system specific,
but the design of a VFS layer itself (even one like UFS) doesn't want
to be system specific.  Ie: the FFS module should operate the same way
without regard to whether it's running in a Linux environment, or a BSD
environment, or a Windows environment: it shouldn't matter.

We can see the beginnings of this by looking at the existing FFS/UFS
layering split, where the imposition of a directory hierarchy is done
in a seperate stacking layer (UFS) from the imposition of a flat numeric
name space layer (FFS inodes).  It is eminently logical to extend
this to the idea that a flat numeric namespace of groups of blocks
(the inode layer) be implemented on a flat numeric namespace of
blocks (that is, device access via VFS interface).

We can either decide that the current UFS/FFS interface is wrong, or
we can decide that the current FFS/VM interface is wrong.  Since the
former is more flexible, it's an easy choice to make.

This is also why it's a mistake to make FFS/UFS specific code for the
"soft updates" implementation: it seperates the UFS-with-soft-updates
used by FFS from the UFS-without-soft-updates used by LFS, and is a
move *away* from code reuse.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701270157.SAA02602>