Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Nov 1996 11:13:59 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        dg@root.com
Cc:        ponds!rivers@dg-rtp.dg.com, terry@lambert.org, dyson@freefall.freebsd.org, freebsd-hackers@freefall.freebsd.org
Subject:   Re: More info on the daily panics...
Message-ID:  <199611071813.LAA10315@phaeton.artisoft.com>
In-Reply-To: <199611070813.AAA02988@root.com> from "David Greenman" at Nov 7, 96 00:13:47 am

next in thread | previous in thread | raw e-mail | index | archive | help
> > After Terry signs off on this, can someone get it committed to
> >2.1.6 (nee 2.1.5-STABLE)...
> 
>    I think there is a lot of misinformation being bandied about this problem.
> In the one case of this problem that I looked into, it was caused by the
> v_usecount being negative. There was clearly one too many vrele's occuring
> somewhere. It was not caused by any sort of "freelist wrap" condition. The
> patch you've provided will kludge around the problem, but it is not in any
> sense a "fix". Each time a vnode is deallocated too many times (and thus
> v_usecount goes negative), you'll end up losing all of the vnodes on the
> freelist that follow it (potentially several thousand) because the condition
> will never go away. This is bad.

The intent of the fix is to prevent improper vnode reuse around list
expansion time, not to kludge multiple frees.


The multiple free panic should occur in the vrele().  It should be an
assert or other compile time debug option, since it is in the critical
path for what *should* be a fast operation and *should* never happen.

The effect of the things are unrelated, but similar in appearance.


This patch *is* a kludge: it's a kludge against the fact that there is
a two stage decommit (which is itself a kludge, with no way to recover
perfectly valid data hung off an in core vnode buffer list pointer once
the FS data has been dissociated from that vnode).  The problem is the
race conditions in the two stage decommit during a period of incresing
demand and high vnode reuse.

Multiple vrele()'s are a seperate problem with similar symptoms, but
the symptoms evidence at a slightly different time.  A multiple vrele
will result in the panic on an increasing usecount.  A multiple
allocation of the same vnode at list grow time will result in a
delayed crash effect at a later time, not necessarily (but possibly)
on an increasing usecount.

If you are really concerned that this will mask a future multiple vrele()
problem, I suggest you put the assert in vrele() and prevent the queue
from ever getting corrupted that way in the first place.

This is a horse-out-of-the-barn issue, in any case, since it means a
significant coding error in one FS or another for it to happen.  The
wrap error, on the other hand, just needs the right timing window on
a WAIT memory allocation... generally on a busy machine, where the
boundry crossing of the high water mark will be relatively frequent.


If you are in the assert-generating mood, I'd also suggest a verification
macro to be called inside every VTOI()/ITOV() operation to verify that
the inode the vnode points to point to the same vnode, and that the
vnode the inode points to points to the same inode (depending on the op).
You would probably have to make them into function calls, which would
mean moving references to them out of the data declarations for some
auto variables in a lot of places (the code generated should remain
the same -- data declaration constant relative references are shorthand
only, and have no real ability to affect code compactness).

Since it would happen all over the place, this would be too expensive
to leave on in a GENERIC kernel, but would catch both the problem you
are concerned about, the one I'm concerned about, and other error cases
which neither of us may have considered.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611071813.LAA10315>