From owner-freebsd-fs  Wed Nov 24 10:21:19 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2E342152E8; Wed, 24 Nov 1999 10:21:07 -0800 (PST)
	(envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id LAA21327;
	Wed, 24 Nov 1999 11:19:54 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp05.primenet.com, id smtpdAAASQaazP; Wed Nov 24 11:19:35 1999
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id LAA19803;
	Wed, 24 Nov 1999 11:19:52 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911241819.LAA19803@usr08.primenet.com>
Subject: Re: namei() and freeing componentnames
To: eivind@FreeBSD.ORG (Eivind Eklund)
Date: Wed, 24 Nov 1999 18:19:52 +0000 (GMT)
Cc: fs@FreeBSD.ORG
In-Reply-To: <19991112000359.A256@bitbox.follo.net> from "Eivind Eklund" at Nov 12, 99 00:03:59 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> I would like to make this reflexive - "symmetrical" allocation and
> free, like it presently is supposed to be with SAVESTART (but isn't -
> there are approximately one billion bugs in the code).
> 
> I suspect that for some filesystems (though none of the present ones),
> it might be necessary to do more than a
> zfree(namei_zone,cnp->cn_pnbuf) in order to free up all the relevant
> data.  In order to support this, we'd have to introduce a new VOP -
> tentatively called VOP_RELEASEND().  Unfortunately, this comes with a
> performance penalty.


A VOP_RELEASEND() call is a bad idea.

The path name buffers should be considered an opaque resource by
the underlying filesystem.

One can think of the path name buffers as containing three parts:

1)	Allocated information which may be referenced by a VFS,
	but not deallocated or otherwise modified.

2)	Context-free statites.  This is state information which
	is present in the structure, and can be modified by a VFS
	according to globally applicable rules.

3)	Contextual statites.  This is state information which is
	present in the structure, and can be modified by a VFS
	according to contract with upper level code.

Currently, there are not VFSs which support, require, or use
contextual statetites.  Such things will probably be necessary
to support multiple simultaneous name spaces which are not
lazy-bound (e.g. supporting the 8.3 and long name name spaces
for newly created files in a VFAT32FS or NTFS), but this is a
special case for which other FreeBSD support is currently
missing anyway.

I would delay the introduction of a VOP dealing with path
name buffers until such time as contextual statites that
require VFS based allocation of arbitrary structure data
become necessary.  Even then, it may be only necessary to
realize two additional structure elements: one that has a
void pointer, and one that has the memory pool from which
the data referenced by a non-NULL void pointer was allocated
(one wonders why a pointer can not be asked to which pool
it belongs, so that pool identity is not required on free).

A common technique used in such cases is to allocated the
data pointed to by an allocated structure contiguous to the
structure (e.g. in the same allocation), and have the internal
structure pointer elements point into memory following the
structure.  This allows the pointer to be freed opaquely,
with all concommitant allocations, e.g.:

	struct foo {
		char	*string;
		...
	};

	struct foo *p;

	p = malloc( sizeof(struct foo) + strlen(str) + 1);
	p->string = ((char *)p) + sizeof(struct foo);
	strcpy( p->string, str);

	...

	free( p);


You say that you want it to be reflexive and symmetrical; path
name buffers are allocated by the VFS consumer.  To achieve
this goal, they must also be deallocated by the VFS consumer.

One of the largest barriers to transaction using VFSs in BSD
at this point is that the VOP_ABORTOP() frees the path name
buffer, and it should not.


> It also allows an evil hack:
> The NFS code is rather incestuous with the VFS system, in order to
> minimize the amount of cached data during NFS requests.

It is, like the system call layer, a consumer of the VFS.  It is
not NFS' fault that the system call layer has historically been
treated as a "more equal pig" when it comes to consuming the VFS.

I am well aware of the path name buffer switch that occurs in the
NFS server.  The simple answer is "caller frees".  One the path
name buffer allocation and deallocation has been rationalized,
the NFS code becomes much simpler: as a consumer of the VFS
interface, it allocates and deallocates the path name buffers
that it utilized, just like any other VFS consumer.

The main grossness comes from the use of "goto" statements
and targets in the macro definitions.  This can be alleviated
be incorporating the path name free into the "bail out" case,
and preinitializing the path name buffer pointer to NULL so
that it can be tested for validity on a premature exit.


> One side of
> this is that it seems to throw away the vnode we'd like to use for
> VOP_RELEASEND() - before it wants to throw away the componentname.

Yes.

If you examine the vop_lookup.c code, you will see that it
avoids this by hiding the act in a mutual function recursion;
this is the same one that it uses to do symlink expansion in
pace in the path name buffer to avoid having to allocate more
buffer space, and to avoid exceeding the  1024 byte path length
limit on the allocated path name buffer.


> Is it too evil?  I'm of two minds - I don't like messing more than
> necessary with the NFS code (and isn't sure I could do the messing
> without performance impact), but I'm not exactly ecstatic about the
> hack, either.

It's too evil, from a lot of perspectives.  I think that the
per-VFS lookup private resource release is a premature feature
creep, and it's probably not justified, when a relatively opaque
(or opaque, if the memory pool identity didn't need to be cached)
pointer could take its place.


I believe the NFS code could be handled without a performance
impact; there are already path component name buffers being
allocated and deallocated in the cases you are worried about,
they're just not being allocated and deallocated symmetrically.


I also think that the primary evil of the additional VOP is that
it takes the code further from where it needs to be.  The abomination
that is NFS cookies is a result of overloading the VOP_LOOKUP code
in order to obtain directory restart, when the underlying FS's
directory entry block entry (struct dirent) is larger than the
one that you proxy over the wire.

I think that the correct way to deal with this is to define an
externalization VOP seperate from the VOP_LOOKUP, which will
do the data externalization for you.

This would have the side effect of NFS-izing all future FSs,
since the same code could be used both by NFS and the system
call layer.  Currently, the system call layer does not do
the "cookie dance", and so that code is relatively unmaintained.
If all VFS consumers consumed the same code path, the code in
the path would be maintained.

Anyway, that's my two cents...


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message