Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Aug 1999 11:13:13 -0700 (PDT)
From:      Julian Elischer <julian@whistle.com>
To:        fs@freebsd.org
Subject:   Re: BUG in 3.2 fsck! (fwd)
Message-ID:  <Pine.BSF.3.95.990820111248.1212A-100000@current1.whistle.com>

next in thread | raw e-mail | index | archive | help
further discussion...

---------- Forwarded message ----------
Date: Fri, 20 Aug 1999 02:57:25 -0700 (PDT)
From: milt <milt@vicor-nb.com>
To: julian@whistle.com, milt@vicor-nb.com
Cc: cayford@vicor-nb.com, conor@vicor-nb.com, davep@vicor-nb.com,
    daver@vicor-nb.com, jrh@vicor-nb.com
Subject: Re: BUG in 3.2 fsck!

HIYA

Well, soft updates sure sounds interesting.  One of our current problems is
that our damn raids don't preserve the disk write order requested by unix. 
I'm fighting that - maybe soft updates will give me enough ammunition to win
that fight.  If not, soft updates won't help us!

I haven't read the soft updates paper yet - I will, but not tonight.  One
question, I am not sure how soft updates are intended to inter-act with fsck.
Is it your intention that fsck behave differently only in preen mode? If so,
you have mis-spelled at least one if statement (the LINK COUNT INCREASING
test in dir.c::adjust).

I am writing this now because I want to let you see my current status for fsck
fixes before you install my previous patch.  I will repeat all of this in
later mail (with test instructions yet) once I figure out what the heck I want
to do.  What I have right now is:

BUG 1: When lost+found is allocated on a new highest block number for a
       cylinder group it ends up without an inoinfo entry and will be flagged
       available during pass 5.

       This is the one in my previous mail.

BUG 2: When an orphan directory happens to start with the a parent pointer to
       an inode which will become a newly allocated lost+found, the loop in
       pass 2 will skip the i_dotdot update because it points to a USTATE
       inode, but pass 3 will unwind the update which wasn't done because it
       unwinds i_dotdot for everything it connects!  (The inode isn't USTATE
       anymore because it's now lost+found's inode.)

BUG 3: It has become virtually impossible to learn things from redirected
       output.  Some lines go partially to stderr and partially to stdout with
       disastrous results even when both stdout and stderr are redirected!

       What I am running right now is an fsck that does not mention stderr.
       (2.2.8's fsck mentions stderr only for fatal setup problems - that
       works too, but it requires less thought to just eliminated stderr.)

BUG 4: When Milt's new code puts over 32768 files in lost+found is is
       committing a grave error (di_nlinks is a signed, 16 bit quantity).
       Milt better get his act together before he publishes this.

       NOTE: there is no problem with allocation or extra passes here.  fsck
       has long been allocating disk pages as it extends lost+found.

NEW FEATURE: a q switch which suppresses output for and questions about things
      that would be(/are) fixed in preen mode.  When q is in effect, preen
      mode fixes just happen - no notification to the operator and no
      questions.

      This allows us to get a screen which shows only the interesting errors.
      The preen mode problems get fixed quietly and only the serious stuff
      ends up in the operators face or on the redirected output file!

      My original intention was to have preen mode keep running after some
      errors, but I now understand why you thought that would be hard.  This
      new switch achieves my goal of seeing only the real problems and is lots
      easier to implement.

DISCUSSION:

As you can deduce from my discovery of bug 4, I really am having lots of fun
testing all this junk.

Current solution to bug 2 is to update the i_dotdot count even in USTATE
inodes during pass2.  That causes lost+found to come out right but pre-cludes
adding inodes in mid stream, invalidating my previous patch for bug 1.

Currently, I am pre-allocating one extra inoinfo slot per cylinder group
(which prevents bug 1) and updating USTATE counts (which fixes bug 2).

I realized that bug 4 was out there only a few minutes ago.  Two solutions
occur to me:

a. Switch to a new directory under a different name when lost+found has 32760
   entries.

b. Bag it and claim lost+found is full when it has 32760 files in it.

With 5 to 8 million files/directories in a file system, 32760 isn't very many
so I am not enthused about b.  On the other hand, I can't think of a fix for
bugs 1/2 which is compatable with solution a.  So, I think I'll go to bed!

Hmmm, pondering and rereading this an interesting possibility occurs to me. 
On a bad hardware day, it would help if we put each fsck run in a different
lost+found directory (lost+found.01, lost+found.02, etc.).  fsck would ALWAYS
allocate a new lost+found and if you had multi crashes on one day it would be
easier to tell which lost+found files should be recovered to where.  (We
really do have tools to recover these beasties and are working on improving
them.)  Which makes the unimplementable solution a more interesting!




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.990820111248.1212A-100000>