From owner-freebsd-fs  Wed Nov 24 11: 4:49 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7AF86153E8; Wed, 24 Nov 1999 11:04:09 -0800 (PST)
	(envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.8.8/8.8.8) id MAA03761;
	Wed, 24 Nov 1999 12:03:15 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp02.primenet.com, id smtpd003665; Wed Nov 24 12:03:07 1999
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id LAA21738;
	Wed, 24 Nov 1999 11:55:04 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911241855.LAA21738@usr08.primenet.com>
Subject: Re: namei() and freeing componentnames
To: eivind@FreeBSD.ORG (Eivind Eklund)
Date: Wed, 24 Nov 1999 18:55:04 +0000 (GMT)
Cc: ezk@cs.columbia.edu, fs@FreeBSD.ORG
In-Reply-To: <19991118153220.E45524@bitbox.follo.net> from "Eivind Eklund" at Nov 18, 99 03:32:20 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> Yes, this is the intent.
> 
> The problem I'm finding with VOP_RELEASEND() is that namei() can
> return two different vps - the dvp (directory vp) and the actual vp
> (inside the directory dvp points at), and that neither of these are
> always available.

What gets returned is based on the flags passed down.  I think
that trying to encapsulate this transparently, so that any
namei() operation that succeeds or fails can be freed in its
entirety without resort to flags specific code in the caller
is a mistake.  I don't think you can reasonably do this.

One issue that occurs to me is that namei() itself, and not the
underlying VOP_LOOKUP code, should be the one to reference the
path component name cache.  If the underlying VFS doesn't want
the cache hit to occur without notifying it of the event, then
it needs to not enter the data in the cache.  This would simplify
a large amount of code.

The other simplification, which is organizational, and could,
using inline functions, be effectively NULL additional code
overhead, is to seperate the lookup operations by request
type.  Whether or not something wants the parent directory
back has much to do with whther it is a create or rename
operation, and little to do with anything else.  Operations
which intend to modify the returned directory entry are very
distinct from those merely doing a lookup.

I have often felt that much of the mess create/rename/delete/open
variant behaviour causes should be addressed by moving the
complexity to upper level code.


> Progress report: Based on current rate of progress, it looks like I'll
> be able to have patches ready for (my personal) testing sunday (or
> *possibly* saturday, but most likely not).  Depending on how
> testing/debugging works out, the patches will most likely be ready for
> public testing sometime next week.  I'll need help with NFS testing.

Heh.  This is the same stumbling block I hit, needing help with
NFS testing.  I created, and I believe it was Peter who updated
it, a testing framework that can detect kernel memory leaks from
user space, and which exercised the entire branch path for the
namei()/nameifree() cases.  This would probably be a good thing
for someone to use, since it will identify the branch path in
which any memory leaks are occurring.

> Forward view: I'm undecided on the next step.  Possibilities:
>
> (1) Change the way locking is specificied to make it feasible to test
>     locking patches properly, and change the assertion generation to
>     generate better assertions.  This will probably require changing
>     VOP_ISLOCKED() to be able to take a process parameter, and return
>     different valued based on wether an exlusive lock is held by that
>     process or by another process.  The present behaviour will be
>     available by passing NULL for this parameter.
> 
>     Presently, running multiple processes does not work properly, as
>     the assertions do not really assert the right things.
> 
>     These changes are necessary to properly debug the use of locks,
>     which I again believe is necessary for stacking layers (which I
>     would like to work in 4.0, but I don't know if I will be able to
>     have ready).

This would be nice; I still believe most of the vnode and the
advisory locking code can move to upper layers.  I think it is
the responsibility of the stacking layers to propagate locks,
and the only place that this is really an issue is on fan-in
or fan-out.

Please keep an eye towards not precluding Jermey Allisons work
on a kernel opportunity locking interface, since it's really
needed to do hosted OS/host OS coherency properly (e.g. Samba
clients must obey UNIX locks, and UNIX applications must obey
those of Samba).  This is similar to what NFS clients and local
applications must do to interoperate, and is the primary purpose
of the LOASE interface.


> (2) Change the behaviour of VOP_LOOKUP() to "eat as much as you can,
>     and return how much that was" rather than "Eat a single path
>     component; we have already decided what this is."
>     This allows different types of namespaces, and it allows
>     optimizations in VOP_LOOKUP() when several steps in the traversal
>     is inside a single filesystem (and hey - who mounts a
>     new filesystem on every directory they see, anyway?)

The path component buffer mechanism already specifies this behaviour
as one of its initial design requirements, so I think this is already
taken care of.

What does not happen is that lookups that will take place in a
single VFS are not held down in that VFS for the entire traversal,
but instead pop up to namei().

I don't think you can get rid of this, without destroying the
"union" option (not the same as the "unionfs"), and without
damaging the ability to cover mount points and to chroot or
do symlink expansion, or deal with POSIX namespace escape.

The original reason for allowing this behaviour at all, according
to Heidemann's thesis, is to permit an underlying FS to "eat as
much as you want", as opposed to "eat as much as you can".  This
was used in proxy VFS stacking layers, since a proxy layer knows
that it owns the entire tree inferior to the current component.


One "low hanging fruit" optimization that can be made is to
_always_ set the fdp->fd_rdir to the processes current
root directory; this avoids the NULL/non-NULL test, so long
as it is inherited correctly on fork, and set for init.

This would be very nice for many other reasons... 8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message