From owner-freebsd-fs Thu Nov 18 6:32:30 1999 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 84E6E1513B for ; Thu, 18 Nov 1999 06:32:22 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id PAA05340; Thu, 18 Nov 1999 15:32:21 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id PAA62682; Thu, 18 Nov 1999 15:32:20 +0100 (MET) Date: Thu, 18 Nov 1999 15:32:20 +0100 From: Eivind Eklund To: Erez Zadok Cc: fs@FreeBSD.ORG Subject: Re: namei() and freeing componentnames Message-ID: <19991118153220.E45524@bitbox.follo.net> References: <19991112000359.A256@bitbox.follo.net> <199911152312.SAA21891@shekel.mcl.cs.columbia.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <199911152312.SAA21891@shekel.mcl.cs.columbia.edu>; from ezk@cs.columbia.edu on Mon, Nov 15, 1999 at 06:12:09PM -0500 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org [Note to impatient readers - forward view if included at the bottom of this mail] On Mon, Nov 15, 1999 at 06:12:09PM -0500, Erez Zadok wrote: > In message <19991112000359.A256@bitbox.follo.net>, Eivind Eklund writes: > [...] > > I suspect that for some filesystems (though none of the present ones), > > it might be necessary to do more than a > > zfree(namei_zone,cnp->cn_pnbuf) in order to free up all the relevant > > data. In order to support this, we'd have to introduce a new VOP - > > tentatively called VOP_RELEASEND(). Unfortunately, this comes with a > > performance penalty. > > Will VOP_RELEASEND be able to call a filesystem-specific routine? I think > it should be flexible enough. All VOPs are filesystem specific (or can be, at least). > I can imagine that the VFS will call a (stackable) filesystem's > vop_releasend(), and that stackable f/s can call a number of those > on the lower level filesystem(s) it stacked on (there could be more > than one, namely fan-out f/s). Yes, this is the intent. The problem I'm finding with VOP_RELEASEND() is that namei() can return two different vps - the dvp (directory vp) and the actual vp (inside the directory dvp points at), and that neither of these are always available. As I am writing the code right now, I am using either of these, with a preference for the dvp. I am considering splitting VOP_RELEASEND() into VOP_RELEASEND() and VOP_DRELEASEND(), which takes the different VPs as parameters - this will at least give something that is easy to search for if we need to change the behaviour somehow. > [...] > > This is somewhat vile, but has the advantage of keeping the code ready > > for the real VOP_RELEASEND(), and not loosing performance until we > > actually get some benefit out of it. > [...] > > Eivind. > > WRT performance, I suggest that if possible, we #ifdef all of the stacking > code and fixes that have a non-insignificant performance impact. Nothing I'm so far positive we will need have a significant performance impact. I'm not sure the performance impact for VOP_RELEASEND() will be significant, either - it is just that I would like to avoid having performance impact without gain, and for this particular case I'm not positive we will ever need it - but I'm not positive we won't, either. This is why I am trying to do the code in a way that let us move to having it quickly, but do not force us to live with the penalites if it turns out we do not need it. > Sure, performance is important, but not at the cost of functionality > (IMHO). Not all users would need stacking, so they can choose not > to turn on the relevant kernel #define and thus get maximum > performance. Those who do want any stacking will have to pay a > certain performance overhead. I hope to make stacking layers really light weight ("featherweight stacking"), and believe it will make sense to use it internally in the kernel organization. If this turns out to be right, everybody will have to have them. > Of course, there's also an argument against too much #ifdef'ed code, > b/c it makes maintenance more difficult. For some of the things I am doing now (e.g, the WILLRELE fixes), ifdef'ing would be a royal pain, making it extremely hard to read the code. > I think we should realize that there would be no way to fix the VFS w/o > impacting performance. Actually, I am reasonably confident that we can do the fixes without impacting performance noticably. > Rather than implement temporary fixes that avoid "hurting" > performance, we can (1) conditionalize that code, (2) get it working > *correctly* first, then (3) optimize it as needed, and (4) finally, > turn it on by default, possibly removing the non-stacking code. What I am doing now is done more or less by these principles - though instead of conditionalizing code I do not know if we will need, I make it very easy to write it if it turns out we will need it. Progress report: Based on current rate of progress, it looks like I'll be able to have patches ready for (my personal) testing sunday (or *possibly* saturday, but most likely not). Depending on how testing/debugging works out, the patches will most likely be ready for public testing sometime next week. I'll need help with NFS testing. Forward view: I'm undecided on the next step. Possibilities: (1) Change the way locking is specificied to make it feasible to test locking patches properly, and change the assertion generation to generate better assertions. This will probably require changing VOP_ISLOCKED() to be able to take a process parameter, and return different valued based on wether an exlusive lock is held by that process or by another process. The present behaviour will be available by passing NULL for this parameter. Presently, running multiple processes does not work properly, as the assertions do not really assert the right things. These changes are necessary to properly debug the use of locks, which I again believe is necessary for stacking layers (which I would like to work in 4.0, but I don't know if I will be able to have ready). (2) Change the behaviour of VOP_LOOKUP() to "eat as much as you can, and return how much that was" rather than "Eat a single path component; we have already decided what this is." This allows different types of namespaces, and it allows optimizations in VOP_LOOKUP() when several steps in the traversal is inside a single filesystem (and hey - who mounts a new filesystem on every directory they see, anyway?) This change is rather small, and it would be nice to have in 4.0 (I want the VFS differences from 4.0 to 5.0 to be as small as possible). It is pretty orthogonal to stacking layers; stacking layers gain the same capabilities as other file systems from it. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message