From owner-freebsd-current  Sun Oct 18 15:17:57 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id PAA14606
          for freebsd-current-outgoing; Sun, 18 Oct 1998 15:17:57 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA14589
          for <current@FreeBSD.ORG>; Sun, 18 Oct 1998 15:17:50 -0700 (PDT)
          (envelope-from tlambert@usr07.primenet.com)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.8.8/8.8.8) id PAA00261;
	Sun, 18 Oct 1998 15:17:27 -0700 (MST)
Received: from usr07.primenet.com(206.165.6.207)
 via SMTP by smtp03.primenet.com, id smtpd000246; Sun Oct 18 15:17:26 1998
Received: (from tlambert@localhost)
	by usr07.primenet.com (8.8.5/8.8.5) id PAA13757;
	Sun, 18 Oct 1998 15:17:23 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199810182217.PAA13757@usr07.primenet.com>
Subject: Re: -current NFS problem
To: dg@root.com
Date: Sun, 18 Oct 1998 22:17:23 +0000 (GMT)
Cc: tlambert@primenet.com, green@zone.syracuse.NET, grog@lemis.com,
        julian@whistle.com, mike@smith.net.au, bag@sinbin.demos.su,
        rock@cs.uni-sb.de, current@FreeBSD.ORG
In-Reply-To: <199810182112.OAA06041@implode.root.com> from "David Greenman" at Oct 18, 98 02:12:30 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> 
> >>    Actually that's not true. I can't speak for all of the NFS implementations
> >> above, but at least in BSD/OS, it works only because they have warts all over
> >> the place to sidestep the problems with not having FS node locking.
> >
> >OK, after having spent several hours trying to see what you mean
> >here, I'm having a hard time understanding why locking the vnode
> >that holds the pointer to the nfsnode data is not an implicit lock
> >on the underlying nfsnode, just as it is for all other data that
> >the vnode references.
> 
>    Because there is no such thing as a "vnode lock", despite the terminology
> that we use in describing it. The lock state is stored in the attached
> fsnode, not in the vnode. For FFS, it is stored in a lock struct that is
> the first item in the in-core inode. In the special case of NFS, it's not
> stored at all. This means that VOP_LOCK()/VOP_UNLOCK() do nothing useful
> for NFS and this exposes all sorts of bugs in code that assumes that they
> do.

Well, this is utterly bogus.  What happened to v_interlock?

The locking of vnodes should be done against the vnode.

Just like the advisory locks should be hung off the vnode instead of
off of ip->i_lockf.

I have to say the the implementation of vop_nolock(), and the mere
existance of vfs_default.c leaves me less than enthused.  As things
now stand, I can't collapse NULL layers into a single indirection
during vfs_init(), nor can I sort the descriptor entries in order
to avoid descriptor overhead for cases where the descriptor isn't
being externalized (i.e., all VFS stacking layers except Heidemann's
VFS network proxy layer).  8-(.

I believe that statment in the comments:

         * This code cannot be used until all the non-locking filesystems
         * (notably NFS) are converted to properly lock and release nodes.

Is correct to this point, if we presume a default implementation instead
of a veto-based implementation at the vn_ call layer.

         * Also, certain vnode operations change the locking state within
         * the operation (create, mknod, remove, link, rename, mkdir, rmdir,
         * and symlink).

I think this is unavoidable, so long as the operations change the
lock state.

			  Ideally these operations should not change the
         * lock state, but should be changed to let the caller of the
         * function unlock them.

And that this is the correct soloution.


				  Otherwise all intermediate vnode layers
         * (such as union, umapfs, etc) must catch these functions to do
         * the necessary locking at their layer.

I think this is incorrect.  I think that because we might have an FS
that externalizes two of the VFS layers in a given stack, that the
VFS layer will have to handle this, regardless.  The NULL layers,
wherein there is not a VFS MUX (e.g., nullfs, umapfs) can ignore
the proxy locking (presuming the operations for the NULL layers
have been collapes out, and that the exposed layer "does the right thing"
interms of requiring the upper layer to proxy the lock down), while
the layers with inherent MUX properties (e.g., unionfs, translucentfs)
must catch the downpath in all cases to proxy to the N (N > 1)
underlying vnodes.

If this doesn't happen, then these things will never do cache coherency
correctly without requiring access via bmap and synchronization events
(which would be a loss).

					          Note that the inactive
         * and lookup operations also change their lock state, but this
         * cannot be avoided, so these two operations will always need
         * to be handled in intermediate layers.

Actually, I think this can be avoided by moving some more of the common
lookup and inactive code up the function call graph.  It's my opinion
that the bottom level FS's only interest in locking should be in the
local resource pooling, as necessary (ihash is unnecessary, for example)
and in changes, as necessary, to increase SMP granularity (and I don't
think a lot is necessary in that regard, so long as domain locking that
has to occur for local resource pooling, like directory entry management,
is sufficiently granular).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message