Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Jul 2010 09:50:34 -0700
From:      Kirk McKusick <mckusick@mckusick.com>
To:        "Mikhail T." <mi+thun@aldan.algebra.com>
Cc:        fs@freebsd.org
Subject:   Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) 
Message-ID:  <201007221650.o6MGoY9V039222@chez.mckusick.com>
In-Reply-To: <4C486209.7050402@aldan.algebra.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Thu, 22 Jul 2010 11:21:45 -0400
> From: "Mikhail T." <mi+thun@aldan.algebra.com>
> Organization: Virtual Estates, Inc.
> To: Kirk McKusick <mckusick@mckusick.com>
> CC: fs@freebsd.org
> Subject: Re: background fsck considered harmful?
> 
> 21.07.2010 23:35, Kirk McKusick
> > Foreground fsck checks all the disk
> > metadata every time, so hard disk errors are captured immediately
> > before they have had a chance to accumulate. But background fsck
> > users blame it because it has not found them.
>    
> I don't blame the program itself -- if it was deliberately /designed/ to 
> only do partial checking. However, I was under the impression, that the 
> background fsck was meant to do the same job as the "real" one, and 
> that, whenever it did not, it was simply a bug in the /implementation/.
> 
> I suspect, this misconception is shared by plenty of other users... 
> Indeed, even if a inquisitive admin wanted to find out, fsck(8) gives 
> absolutely no warning to that effect -- it simply states, that 
> background fsck will be attempted, whenever possible.
> 
> > If you have small disk systems, running foreground fsck is an
> > acceptable solution (and indeed I would recommend it). But when
> > you are running systems with 20Tb of disks, you are not willing
> > to have your system down for 10 hours after every crash.
> >
> > A reasonable intermediate solution is to use background fsck by
> > default, but schedule down time to run a full fsck once a month
> > or so to check for accumulated hard disk errors.
>    
> Maybe, filesystems less than, say, 100Gb (default threshold, subject to 
> admin's adjustment) in size should always be foreground fsck-ed? This 
> should, at least, cover the system file-systems (such as / and /var) on 
> typical installations...

If we did not have a better solution in the pipeline (journaled
soft updates), I would agree with you that always doing a full
check on small filesystems would be a useful enhancement. However,
since we do have a solution that will work well for all sizes of
filesystems in -current and expected out of the box with 9.0, I do
not think that it would be useful to add this extra complexity
at this time.

> And a stern warning issued, when a background fsck is attempted -- for 
> whatever reason. Something like:
> 
>     background fsck, although faster, may be unable to detect certain
>     rare forms of filesystem corruption. You are advised to perform a
>     full fsck on %s on a regular basis. See fsck(8).
> 
> should go into the right place under fsck_ffs/ -- not sure, where exactly...

Since most folks do not look at the output from background fsck and with
the changes noted above, I do not feel that adding this message would
be all that helpful at this time.

> Below is a simple patch for the top-level fsck(8). Somebody more 
> knowledgeable of the details should augment fsck_ffs(8) -- it currently 
> gives the lists of inconsistencies checked for without mentioning the 
> difference in coverage between full and background modes...
> 
>     diff -U 2 -r1.38.2.1 fsck.8
>     --- fsck.8      3 Aug 2009 08:13:06 -0000       1.38.2.1
>     +++ fsck.8      22 Jul 2010 15:19:25 -0000
>     @@ -170,4 +170,12 @@
>       When running in background mode,
>       only one file system at a time will be checked.
>     +.Sy Warning:
>     +because background fsck is performed while the filesystem
>     +is in use, it is limited to checking for only the most commonly
>     +occuring filesystem abnormalities. Under certain circumstances,
>     +some errors can escape background fsck. It is recommended, that you
>     +perform full fsck on your systems once in a while -- or whenever
>     +you encounter filesystem-related panics.
>       .It Fl t Ar fstype
>       Invoke
> 
> Yours,
> 
>     -mi

I concur that adding a note to fsck(8) would be a good idea as best
practice is to run a full fsck after a disk-related panic. I would
be happy with your checking in:

    diff -U 2 -r1.38.2.1 fsck.8
    --- fsck.8      3 Aug 2009 08:13:06 -0000       1.38.2.1
    +++ fsck.8      22 Jul 2010 15:19:25 -0000
    @@ -170,4 +170,12 @@
      When running in background mode,
      only one file system at a time will be checked.
    +.Sy Warning:
    +background fsck is limited to checking for only the most commonly
    +occuring filesystem abnormalities. Under certain circumstances,
    +some errors can escape background fsck. It is recommended, that you
    +perform full fsck on your systems once in a while -- or whenever
    +you encounter filesystem-related panics.
      .It Fl t Ar fstype
      Invoke

Does this work for you?

	Kirk McKusick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201007221650.o6MGoY9V039222>