From owner-freebsd-fs Wed Nov 24 10:21:19 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 2E342152E8; Wed, 24 Nov 1999 10:21:07 -0800 (PST) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id LAA21327; Wed, 24 Nov 1999 11:19:54 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAASQaazP; Wed Nov 24 11:19:35 1999 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id LAA19803; Wed, 24 Nov 1999 11:19:52 -0700 (MST) From: Terry Lambert Message-Id: <199911241819.LAA19803@usr08.primenet.com> Subject: Re: namei() and freeing componentnames To: eivind@FreeBSD.ORG (Eivind Eklund) Date: Wed, 24 Nov 1999 18:19:52 +0000 (GMT) Cc: fs@FreeBSD.ORG In-Reply-To: <19991112000359.A256@bitbox.follo.net> from "Eivind Eklund" at Nov 12, 99 00:03:59 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I would like to make this reflexive - "symmetrical" allocation and > free, like it presently is supposed to be with SAVESTART (but isn't - > there are approximately one billion bugs in the code). > > I suspect that for some filesystems (though none of the present ones), > it might be necessary to do more than a > zfree(namei_zone,cnp->cn_pnbuf) in order to free up all the relevant > data. In order to support this, we'd have to introduce a new VOP - > tentatively called VOP_RELEASEND(). Unfortunately, this comes with a > performance penalty. A VOP_RELEASEND() call is a bad idea. The path name buffers should be considered an opaque resource by the underlying filesystem. One can think of the path name buffers as containing three parts: 1) Allocated information which may be referenced by a VFS, but not deallocated or otherwise modified. 2) Context-free statites. This is state information which is present in the structure, and can be modified by a VFS according to globally applicable rules. 3) Contextual statites. This is state information which is present in the structure, and can be modified by a VFS according to contract with upper level code. Currently, there are not VFSs which support, require, or use contextual statetites. Such things will probably be necessary to support multiple simultaneous name spaces which are not lazy-bound (e.g. supporting the 8.3 and long name name spaces for newly created files in a VFAT32FS or NTFS), but this is a special case for which other FreeBSD support is currently missing anyway. I would delay the introduction of a VOP dealing with path name buffers until such time as contextual statites that require VFS based allocation of arbitrary structure data become necessary. Even then, it may be only necessary to realize two additional structure elements: one that has a void pointer, and one that has the memory pool from which the data referenced by a non-NULL void pointer was allocated (one wonders why a pointer can not be asked to which pool it belongs, so that pool identity is not required on free). A common technique used in such cases is to allocated the data pointed to by an allocated structure contiguous to the structure (e.g. in the same allocation), and have the internal structure pointer elements point into memory following the structure. This allows the pointer to be freed opaquely, with all concommitant allocations, e.g.: struct foo { char *string; ... }; struct foo *p; p = malloc( sizeof(struct foo) + strlen(str) + 1); p->string = ((char *)p) + sizeof(struct foo); strcpy( p->string, str); ... free( p); You say that you want it to be reflexive and symmetrical; path name buffers are allocated by the VFS consumer. To achieve this goal, they must also be deallocated by the VFS consumer. One of the largest barriers to transaction using VFSs in BSD at this point is that the VOP_ABORTOP() frees the path name buffer, and it should not. > It also allows an evil hack: > The NFS code is rather incestuous with the VFS system, in order to > minimize the amount of cached data during NFS requests. It is, like the system call layer, a consumer of the VFS. It is not NFS' fault that the system call layer has historically been treated as a "more equal pig" when it comes to consuming the VFS. I am well aware of the path name buffer switch that occurs in the NFS server. The simple answer is "caller frees". One the path name buffer allocation and deallocation has been rationalized, the NFS code becomes much simpler: as a consumer of the VFS interface, it allocates and deallocates the path name buffers that it utilized, just like any other VFS consumer. The main grossness comes from the use of "goto" statements and targets in the macro definitions. This can be alleviated be incorporating the path name free into the "bail out" case, and preinitializing the path name buffer pointer to NULL so that it can be tested for validity on a premature exit. > One side of > this is that it seems to throw away the vnode we'd like to use for > VOP_RELEASEND() - before it wants to throw away the componentname. Yes. If you examine the vop_lookup.c code, you will see that it avoids this by hiding the act in a mutual function recursion; this is the same one that it uses to do symlink expansion in pace in the path name buffer to avoid having to allocate more buffer space, and to avoid exceeding the 1024 byte path length limit on the allocated path name buffer. > Is it too evil? I'm of two minds - I don't like messing more than > necessary with the NFS code (and isn't sure I could do the messing > without performance impact), but I'm not exactly ecstatic about the > hack, either. It's too evil, from a lot of perspectives. I think that the per-VFS lookup private resource release is a premature feature creep, and it's probably not justified, when a relatively opaque (or opaque, if the memory pool identity didn't need to be cached) pointer could take its place. I believe the NFS code could be handled without a performance impact; there are already path component name buffers being allocated and deallocated in the cases you are worried about, they're just not being allocated and deallocated symmetrically. I also think that the primary evil of the additional VOP is that it takes the code further from where it needs to be. The abomination that is NFS cookies is a result of overloading the VOP_LOOKUP code in order to obtain directory restart, when the underlying FS's directory entry block entry (struct dirent) is larger than the one that you proxy over the wire. I think that the correct way to deal with this is to define an externalization VOP seperate from the VOP_LOOKUP, which will do the data externalization for you. This would have the side effect of NFS-izing all future FSs, since the same code could be used both by NFS and the system call layer. Currently, the system call layer does not do the "cookie dance", and so that code is relatively unmaintained. If all VFS consumers consumed the same code path, the code in the path would be maintained. Anyway, that's my two cents... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message