Date: Fri, 20 Aug 1999 11:13:13 -0700 (PDT) From: Julian Elischer <julian@whistle.com> To: fs@freebsd.org Subject: Re: BUG in 3.2 fsck! (fwd) Message-ID: <Pine.BSF.3.95.990820111248.1212A-100000@current1.whistle.com>
next in thread | raw e-mail | index | archive | help
further discussion...
---------- Forwarded message ----------
Date: Fri, 20 Aug 1999 02:57:25 -0700 (PDT)
From: milt <milt@vicor-nb.com>
To: julian@whistle.com, milt@vicor-nb.com
Cc: cayford@vicor-nb.com, conor@vicor-nb.com, davep@vicor-nb.com,
daver@vicor-nb.com, jrh@vicor-nb.com
Subject: Re: BUG in 3.2 fsck!
HIYA
Well, soft updates sure sounds interesting. One of our current problems is
that our damn raids don't preserve the disk write order requested by unix.
I'm fighting that - maybe soft updates will give me enough ammunition to win
that fight. If not, soft updates won't help us!
I haven't read the soft updates paper yet - I will, but not tonight. One
question, I am not sure how soft updates are intended to inter-act with fsck.
Is it your intention that fsck behave differently only in preen mode? If so,
you have mis-spelled at least one if statement (the LINK COUNT INCREASING
test in dir.c::adjust).
I am writing this now because I want to let you see my current status for fsck
fixes before you install my previous patch. I will repeat all of this in
later mail (with test instructions yet) once I figure out what the heck I want
to do. What I have right now is:
BUG 1: When lost+found is allocated on a new highest block number for a
cylinder group it ends up without an inoinfo entry and will be flagged
available during pass 5.
This is the one in my previous mail.
BUG 2: When an orphan directory happens to start with the a parent pointer to
an inode which will become a newly allocated lost+found, the loop in
pass 2 will skip the i_dotdot update because it points to a USTATE
inode, but pass 3 will unwind the update which wasn't done because it
unwinds i_dotdot for everything it connects! (The inode isn't USTATE
anymore because it's now lost+found's inode.)
BUG 3: It has become virtually impossible to learn things from redirected
output. Some lines go partially to stderr and partially to stdout with
disastrous results even when both stdout and stderr are redirected!
What I am running right now is an fsck that does not mention stderr.
(2.2.8's fsck mentions stderr only for fatal setup problems - that
works too, but it requires less thought to just eliminated stderr.)
BUG 4: When Milt's new code puts over 32768 files in lost+found is is
committing a grave error (di_nlinks is a signed, 16 bit quantity).
Milt better get his act together before he publishes this.
NOTE: there is no problem with allocation or extra passes here. fsck
has long been allocating disk pages as it extends lost+found.
NEW FEATURE: a q switch which suppresses output for and questions about things
that would be(/are) fixed in preen mode. When q is in effect, preen
mode fixes just happen - no notification to the operator and no
questions.
This allows us to get a screen which shows only the interesting errors.
The preen mode problems get fixed quietly and only the serious stuff
ends up in the operators face or on the redirected output file!
My original intention was to have preen mode keep running after some
errors, but I now understand why you thought that would be hard. This
new switch achieves my goal of seeing only the real problems and is lots
easier to implement.
DISCUSSION:
As you can deduce from my discovery of bug 4, I really am having lots of fun
testing all this junk.
Current solution to bug 2 is to update the i_dotdot count even in USTATE
inodes during pass2. That causes lost+found to come out right but pre-cludes
adding inodes in mid stream, invalidating my previous patch for bug 1.
Currently, I am pre-allocating one extra inoinfo slot per cylinder group
(which prevents bug 1) and updating USTATE counts (which fixes bug 2).
I realized that bug 4 was out there only a few minutes ago. Two solutions
occur to me:
a. Switch to a new directory under a different name when lost+found has 32760
entries.
b. Bag it and claim lost+found is full when it has 32760 files in it.
With 5 to 8 million files/directories in a file system, 32760 isn't very many
so I am not enthused about b. On the other hand, I can't think of a fix for
bugs 1/2 which is compatable with solution a. So, I think I'll go to bed!
Hmmm, pondering and rereading this an interesting possibility occurs to me.
On a bad hardware day, it would help if we put each fsck run in a different
lost+found directory (lost+found.01, lost+found.02, etc.). fsck would ALWAYS
allocate a new lost+found and if you had multi crashes on one day it would be
easier to tell which lost+found files should be recovered to where. (We
really do have tools to recover these beasties and are working on improving
them.) Which makes the unimplementable solution a more interesting!
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.990820111248.1212A-100000>
