From owner-freebsd-hackers  Fri Nov  8 08:12:02 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id IAA18664
          for hackers-outgoing; Fri, 8 Nov 1996 08:12:02 -0800 (PST)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id IAA18645;
          Fri, 8 Nov 1996 08:11:56 -0800 (PST)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id JAA12109; Fri, 8 Nov 1996 09:02:33 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199611081602.JAA12109@phaeton.artisoft.com>
Subject: Re: More info on the daily panics...
To: dg@root.com
Date: Fri, 8 Nov 1996 09:02:32 -0700 (MST)
Cc: terry@lambert.org, ponds!rivers@dg-rtp.dg.com, dyson@freefall.freebsd.org,
        freebsd-hackers@freefall.freebsd.org
In-Reply-To: <199611080249.SAA04382@root.com> from "David Greenman" at Nov 7, 96 06:49:56 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> >This patch *is* a kludge: it's a kludge against the fact that there is
> >a two stage decommit (which is itself a kludge, with no way to recover
> >perfectly valid data hung off an in core vnode buffer list pointer once
> >the FS data has been dissociated from that vnode).  The problem is the
> >race conditions in the two stage decommit during a period of incresing
> >demand and high vnode reuse.
> 
>    As always, trying to parse what you're saying is difficult. I interpret
> what you're saying above as there being a race condition when allocating
> inode+vnode pairs (since either one can block and thus someone else might
> get in there and try to do the same thing for the same inode).
>    I fixed that race condition over a year ago by adding a mutex lock and by
> a series of allocation order changes. I might have missed a rarely used
> filesystem (msdosfs?), but the race condition for FFS has been fixed since
> July '95.

I understand that this is the official "fix" for the problem.

This is the "doctor: Then don't do that" scenario.  If that reerence
wasn't clear in my earlier post:

	Patient:	It hurts when I do this
	Doctor:		Then don't do that

The "doctor: Then don't do that" scenario is not a good way to fix
things... it's the difference between:

	A calls B
	B has a bug that A tickles
	Let's change A to not tickle B

As opposed to the right thing to do, which is:

	Let's change B to not have the bug


You'll notice that one of the things I asked him was "What FS types do
you have mounted?".


Without getting into the whole "this code should be infrastructure
instead of being duplicated (potentially erroneously) in every file
system" argument, my suggested "fix" simply prevented one class of
problems from occuring.  I didn't claim it would fix everything (as
a matter of fact, I've only said that it was a slightly better kludge).

With getting into that argument...:


My basic problem is that it's bad to build implied state into interfaces,
especially if you smear state maintenance to both sides of the interface
boundries.  If you document exit and entry state for a function in an
interface, that's OK, as long as you *must* recross that same boundry
to change the state *each time* it is changed.  I *do* understand
that you have to do that kind of thing, at a minimum to provide locking
interfaces.  However, even granting that, the way the vnode management
currently works *still* violates the architecture model and leaves us
with something-that-should-be-a-black-box-but-isn't.

The problem we have is that we have one or more state transitions for
the locking which occur on both sides of the interface.  That's nothing
more than bad abstraction in the design of the interface in the first
place.

My suggested assert "fixes" are also kludges.  They invert the service
order for the interface to make some function in the stack trace at the
time of the panic be the responsible party.  Nevertheless, they are a
hell of a lot better than a delayed (cascade) failure with no way to
find the guilty party.  I see little conceptual difference between
this and unmapping page zero to trap NULL pointer dereferences when
they happen instead of letting something (apparently unrelated) fail
at some later time.  It was The Right Thing To Do to localize NULL
dereferences, and a similar fix is The Right Thing To Do to localize
errors in non-opaque interfaces.  That all interfaces should be opaque
in the first place is kind of irrevent to the applicability if the fix.

The service interface call inversion is unavoidable as long as the
association of an inode with a vnode is, in effect, registration of
a callback function into the FS for later use by vclean.  Having two
or more inodes point at the same vnode makes problems like those we
are seeing *likely*, so long as the association interface is split
across the same boundry at which the lock interface is being defined.
Someone is bound to make an error implementing the undocumented implied
state... I can point to one *right now* in the MSDOSFS directory
traversal parent/child race on a VOP_CREATE.  It's obvious that we
can't consider every possible call graph and its effect on implied
state.  The only real soloution is to *not allow* state implication so
that it isn't ever an issue ever again.


For what it's worth, yes, I do claim that my veto-vs-calldown interface
reorganization, and the single-entry-singe-exit changes for simplifying
the state transition map, would have been the correct first steps on the
path to a real fix.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.