Date: Sun, 26 Jan 1997 18:57:05 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: dg@root.com Cc: terry@lambert.org, michaelh@cet.co.jp, bde@freefall.freebsd.org, Hackers@freebsd.org Subject: Re: cvs commit: src/sys/kern kern_lockf.c Message-ID: <199701270157.SAA02602@phaeton.artisoft.com> In-Reply-To: <199701262156.NAA08258@root.com> from "David Greenman" at Jan 26, 97 01:56:14 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> >This is one of the things I am always on about. > > > > > >The call trace should be: > > > > fcntl(lock) <- check call syntax here > > lf_advlock(lock) <- check arg values here > > if( !VOP_ADVLOCK(lock)) > > lf_advlock(unlock) > > > >And the FS specific VOP_ADVLOCK should simply return 0 in all cases for > >most FS's. > > I disagree. The call trace should be: > > fcntl(lock) > VOP_ADVLOCK(lock) > lf_advlock(lock) > > This works properly with unusual filesystem stacking and is more flexible. ??? How so? Can you give me a stacking situation example where this would be true? The Heidemann paper specifically references null function bodies in a layering design, and the Rosental paper specifically talks about "collapsing" call graphs for null function elements in layers. I can think of several layers where you would want to affect the data operands, but not the hierarchy operands: an encryption layer, etc.. I can also think of several situations where you would want ti affect the hiearchy operands instead of the data operands: a quota layer, an ACL layer, a UMSDOS attribution layer, etc.. I can think of no situation in which I would want to hit the wire for a network FS call, when the given operation will fail remotely and will also fail locally (ie: VOP_ADVLOCK). All you succeed in doing is increasing latency, for no good reason. We can discuss the race conditions using this same call-down type of implementation in VOP_LOCK for directory traversal in an MSDOSFS, and the possibility for error when each FS implementor is required to reimplement upcalls (violating the abstract interface definition) in an identical way. These are obvious, and the related PR's are long-standing. Further, we can identify issues of NFS export resulting from the same style of coding in the mount calls being expected to process the exposed mount points, and of root vs. non-root FS mouninting being valid in some FS's and not in others because of the absurd per FS reimplementation requirements. If even we accepted you as being correct, we could implement the toplogically equivalent arrangement in a veto architecture by mounting an "advisory locking implementation layer" immediately above the terminal layer, and have it make the lf_advlock calls instead. Finally, this neglects fan-out architectures in which there may be a pseudo-vnode as a container object for more than one underlying vnode: it is necessary to lock the container object as well, since a user can legally have a "view" onto any FS "root" in any stack of FS layers (indeed, this *must* happen for most of the existing FS's NFS export implementations). > Only leaf filesystems should call lf_advlock(), so upper layers don't > matter. union_advlock should just be a pass-through. This assumes that a leaf element is rigidly defined (ie: it assumes that the bottom end of the stack will directly access media via a system specific mechanism for doing raw I/O). In the Rosenthal paper, a design is discussed where the bottom end is the same interface as the top end, even for nodes that structure storage. You can think of the bottom ends layer being a layer with a flat name space (not imposing directory hierarchy) and the namespace as being numeric (this is one of the problems with the current "FS responsible for namei buffer deallocation" scheme -- it implies a buffer as opposed to some otherwise opaque layer specific statite). In reality, the bottom end system interface wants to be system specific, but the design of a VFS layer itself (even one like UFS) doesn't want to be system specific. Ie: the FFS module should operate the same way without regard to whether it's running in a Linux environment, or a BSD environment, or a Windows environment: it shouldn't matter. We can see the beginnings of this by looking at the existing FFS/UFS layering split, where the imposition of a directory hierarchy is done in a seperate stacking layer (UFS) from the imposition of a flat numeric name space layer (FFS inodes). It is eminently logical to extend this to the idea that a flat numeric namespace of groups of blocks (the inode layer) be implemented on a flat numeric namespace of blocks (that is, device access via VFS interface). We can either decide that the current UFS/FFS interface is wrong, or we can decide that the current FFS/VM interface is wrong. Since the former is more flexible, it's an easy choice to make. This is also why it's a mistake to make FFS/UFS specific code for the "soft updates" implementation: it seperates the UFS-with-soft-updates used by FFS from the UFS-without-soft-updates used by LFS, and is a move *away* from code reuse. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701270157.SAA02602>