Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Apr 1998 06:02:35 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        eivind@yes.no (Eivind Eklund)
Cc:        fs@FreeBSD.ORG
Subject:   Re: Problem which FSCK doesn't fix
Message-ID:  <199804240602.XAA06993@usr09.primenet.com>
In-Reply-To: <19980423234857.51461@follo.net> from "Eivind Eklund" at Apr 23, 98 11:48:57 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> 
> A users problem: A partition passes fsck (with -f) fine, but causes
> panic()s sometimes ("free vnode isn't"), and makes dump loop 'forever'
> (or at least well past 1700%, which seems 17x too much :-)

The "free vnode isn't" is an inherent race condition having to do with
vclean and how VOP_LOCK can't be trusted because of the way it is
implemented.  This is the basis of my "vclean must die" crusade and
my "VOP_LOCK must become veto based" crusade.

Basically, it's pretty glaringly obvious, if you can get the whole
vnode life cycle in your head at one time.  Going to a veto basis
is just to make sure that all FS's use the corrected code, more than
a real requirement (assuming this is FFS; if not, the FS probably has
bogus semantics, since most of them do).  The problem occurs when
an allocation is made while an allocation is being slept, simultaneously.
I'd explain it, but it's not simple to explain, and you'd be better off
loading all the code in your head and pondering it yourself, rather
than having me load an analogue of the code in your head and stepping
you through it.  When you look at it without taking the whole thing
into account at one time, it looks like it can't happen.  Yet users report
the error, don't they?


> I have a binary copy of the partition, a copy of the disklabel output
> for the disk, and a copy of fdisk output for the disk.
> 
> Anybody that could clue me in to what's the best way to approach this
> (or feel a desire to take over ;-)?

The "free vnode isn't" error is totally unrelated to this one.

The dump looping I don't understand; how large is the image of this
thing?  In general, dump is seriously neglected.  I would not expect
a dump problem, but neither would I be terribly surprised at one.
Code does not mutate, but there are occasions when it is allowed to be
selectively maintained.

You may want to look at reverting the reporting format changes and
some other minor changes that occurred about 8-10 months ago.  This
may or may not fix things, but the closer to the CSRG code base the
code is, the more likely that some academic type sat around for a
quarter or more thinking about the next-to-last cahnge before charging
in and twiddling bits.

					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804240602.XAA06993>