From owner-freebsd-fs  Thu Nov 18  6:32:30 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 84E6E1513B
	for <fs@FreeBSD.ORG>; Thu, 18 Nov 1999 06:32:22 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id PAA05340;
	Thu, 18 Nov 1999 15:32:21 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id PAA62682;
	Thu, 18 Nov 1999 15:32:20 +0100 (MET)
Date: Thu, 18 Nov 1999 15:32:20 +0100
From: Eivind Eklund <eivind@FreeBSD.ORG>
To: Erez Zadok <ezk@cs.columbia.edu>
Cc: fs@FreeBSD.ORG
Subject: Re: namei() and freeing componentnames
Message-ID: <19991118153220.E45524@bitbox.follo.net>
References: <19991112000359.A256@bitbox.follo.net> <199911152312.SAA21891@shekel.mcl.cs.columbia.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <199911152312.SAA21891@shekel.mcl.cs.columbia.edu>; from ezk@cs.columbia.edu on Mon, Nov 15, 1999 at 06:12:09PM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

[Note to impatient readers - forward view if included at the bottom of
this mail]

On Mon, Nov 15, 1999 at 06:12:09PM -0500, Erez Zadok wrote:
> In message <19991112000359.A256@bitbox.follo.net>, Eivind Eklund writes:
> [...]
> > I suspect that for some filesystems (though none of the present ones),
> > it might be necessary to do more than a
> > zfree(namei_zone,cnp->cn_pnbuf) in order to free up all the relevant
> > data.  In order to support this, we'd have to introduce a new VOP -
> > tentatively called VOP_RELEASEND().  Unfortunately, this comes with a
> > performance penalty.
> 
> Will VOP_RELEASEND be able to call a filesystem-specific routine?  I think
> it should be flexible enough.

All VOPs are filesystem specific (or can be, at least).

>  I can imagine that the VFS will call a (stackable) filesystem's
> vop_releasend(), and that stackable f/s can call a number of those
> on the lower level filesystem(s) it stacked on (there could be more
> than one, namely fan-out f/s).

Yes, this is the intent.

The problem I'm finding with VOP_RELEASEND() is that namei() can
return two different vps - the dvp (directory vp) and the actual vp
(inside the directory dvp points at), and that neither of these are
always available.

As I am writing the code right now, I am using either of these, with a
preference for the dvp.  I am considering splitting VOP_RELEASEND()
into VOP_RELEASEND() and VOP_DRELEASEND(), which takes the different
VPs as parameters - this will at least give something that is easy to
search for if we need to change the behaviour somehow.


> [...]
> > This is somewhat vile, but has the advantage of keeping the code ready
> > for the real VOP_RELEASEND(), and not loosing performance until we
> > actually get some benefit out of it.
> [...]
> > Eivind.
> 
> WRT performance, I suggest that if possible, we #ifdef all of the stacking
> code and fixes that have a non-insignificant performance impact.

Nothing I'm so far positive we will need have a significant
performance impact.  I'm not sure the performance impact for
VOP_RELEASEND() will be significant, either - it is just that I would
like to avoid having performance impact without gain, and for this
particular case I'm not positive we will ever need it - but I'm not
positive we won't, either.  This is why I am trying to do the code in
a way that let us move to having it quickly, but do not force us to
live with the penalites if it turns out we do not need it.

> Sure, performance is important, but not at the cost of functionality
> (IMHO).  Not all users would need stacking, so they can choose not
> to turn on the relevant kernel #define and thus get maximum
> performance.  Those who do want any stacking will have to pay a
> certain performance overhead.

I hope to make stacking layers really light weight ("featherweight
stacking"), and believe it will make sense to use it internally in the
kernel organization.  If this turns out to be right, everybody will
have to have them.

> Of course, there's also an argument against too much #ifdef'ed code,
> b/c it makes maintenance more difficult.

For some of the things I am doing now (e.g, the WILLRELE fixes),
ifdef'ing would be a royal pain, making it extremely hard to read the
code.

> I think we should realize that there would be no way to fix the VFS w/o
> impacting performance.

Actually, I am reasonably confident that we can do the fixes without
impacting performance noticably.


> Rather than implement temporary fixes that avoid "hurting"
> performance, we can (1) conditionalize that code, (2) get it working
> *correctly* first, then (3) optimize it as needed, and (4) finally,
> turn it on by default, possibly removing the non-stacking code.

What I am doing now is done more or less by these principles - though
instead of conditionalizing code I do not know if we will need, I make
it very easy to write it if it turns out we will need it.


Progress report: Based on current rate of progress, it looks like I'll
be able to have patches ready for (my personal) testing sunday (or
*possibly* saturday, but most likely not).  Depending on how
testing/debugging works out, the patches will most likely be ready for
public testing sometime next week.  I'll need help with NFS testing.


Forward view: I'm undecided on the next step.  Possibilities:
(1) Change the way locking is specificied to make it feasible to test
    locking patches properly, and change the assertion generation to
    generate better assertions.  This will probably require changing
    VOP_ISLOCKED() to be able to take a process parameter, and return
    different valued based on wether an exlusive lock is held by that
    process or by another process.  The present behaviour will be
    available by passing NULL for this parameter.

    Presently, running multiple processes does not work properly, as
    the assertions do not really assert the right things.

    These changes are necessary to properly debug the use of locks,
    which I again believe is necessary for stacking layers (which I
    would like to work in 4.0, but I don't know if I will be able to
    have ready).

(2) Change the behaviour of VOP_LOOKUP() to "eat as much as you can,
    and return how much that was" rather than "Eat a single path
    component; we have already decided what this is."
    This allows different types of namespaces, and it allows
    optimizations in VOP_LOOKUP() when several steps in the traversal
    is inside a single filesystem (and hey - who mounts a
    new filesystem on every directory they see, anyway?)

    This change is rather small, and it would be nice to have in 4.0
    (I want the VFS differences from 4.0 to 5.0 to be as small as
    possible).
    It is pretty orthogonal to stacking layers; stacking layers gain
    the same capabilities as other file systems from it.

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message