From owner-freebsd-hackers Fri Nov 8 08:12:02 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id IAA18664 for hackers-outgoing; Fri, 8 Nov 1996 08:12:02 -0800 (PST) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id IAA18645; Fri, 8 Nov 1996 08:11:56 -0800 (PST) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id JAA12109; Fri, 8 Nov 1996 09:02:33 -0700 From: Terry Lambert Message-Id: <199611081602.JAA12109@phaeton.artisoft.com> Subject: Re: More info on the daily panics... To: dg@root.com Date: Fri, 8 Nov 1996 09:02:32 -0700 (MST) Cc: terry@lambert.org, ponds!rivers@dg-rtp.dg.com, dyson@freefall.freebsd.org, freebsd-hackers@freefall.freebsd.org In-Reply-To: <199611080249.SAA04382@root.com> from "David Greenman" at Nov 7, 96 06:49:56 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > >This patch *is* a kludge: it's a kludge against the fact that there is > >a two stage decommit (which is itself a kludge, with no way to recover > >perfectly valid data hung off an in core vnode buffer list pointer once > >the FS data has been dissociated from that vnode). The problem is the > >race conditions in the two stage decommit during a period of incresing > >demand and high vnode reuse. > > As always, trying to parse what you're saying is difficult. I interpret > what you're saying above as there being a race condition when allocating > inode+vnode pairs (since either one can block and thus someone else might > get in there and try to do the same thing for the same inode). > I fixed that race condition over a year ago by adding a mutex lock and by > a series of allocation order changes. I might have missed a rarely used > filesystem (msdosfs?), but the race condition for FFS has been fixed since > July '95. I understand that this is the official "fix" for the problem. This is the "doctor: Then don't do that" scenario. If that reerence wasn't clear in my earlier post: Patient: It hurts when I do this Doctor: Then don't do that The "doctor: Then don't do that" scenario is not a good way to fix things... it's the difference between: A calls B B has a bug that A tickles Let's change A to not tickle B As opposed to the right thing to do, which is: Let's change B to not have the bug You'll notice that one of the things I asked him was "What FS types do you have mounted?". Without getting into the whole "this code should be infrastructure instead of being duplicated (potentially erroneously) in every file system" argument, my suggested "fix" simply prevented one class of problems from occuring. I didn't claim it would fix everything (as a matter of fact, I've only said that it was a slightly better kludge). With getting into that argument...: My basic problem is that it's bad to build implied state into interfaces, especially if you smear state maintenance to both sides of the interface boundries. If you document exit and entry state for a function in an interface, that's OK, as long as you *must* recross that same boundry to change the state *each time* it is changed. I *do* understand that you have to do that kind of thing, at a minimum to provide locking interfaces. However, even granting that, the way the vnode management currently works *still* violates the architecture model and leaves us with something-that-should-be-a-black-box-but-isn't. The problem we have is that we have one or more state transitions for the locking which occur on both sides of the interface. That's nothing more than bad abstraction in the design of the interface in the first place. My suggested assert "fixes" are also kludges. They invert the service order for the interface to make some function in the stack trace at the time of the panic be the responsible party. Nevertheless, they are a hell of a lot better than a delayed (cascade) failure with no way to find the guilty party. I see little conceptual difference between this and unmapping page zero to trap NULL pointer dereferences when they happen instead of letting something (apparently unrelated) fail at some later time. It was The Right Thing To Do to localize NULL dereferences, and a similar fix is The Right Thing To Do to localize errors in non-opaque interfaces. That all interfaces should be opaque in the first place is kind of irrevent to the applicability if the fix. The service interface call inversion is unavoidable as long as the association of an inode with a vnode is, in effect, registration of a callback function into the FS for later use by vclean. Having two or more inodes point at the same vnode makes problems like those we are seeing *likely*, so long as the association interface is split across the same boundry at which the lock interface is being defined. Someone is bound to make an error implementing the undocumented implied state... I can point to one *right now* in the MSDOSFS directory traversal parent/child race on a VOP_CREATE. It's obvious that we can't consider every possible call graph and its effect on implied state. The only real soloution is to *not allow* state implication so that it isn't ever an issue ever again. For what it's worth, yes, I do claim that my veto-vs-calldown interface reorganization, and the single-entry-singe-exit changes for simplifying the state transition map, would have been the correct first steps on the path to a real fix. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.