Date: Thu, 22 Jul 2010 09:50:34 -0700 From: Kirk McKusick <mckusick@mckusick.com> To: "Mikhail T." <mi+thun@aldan.algebra.com> Cc: fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) Message-ID: <201007221650.o6MGoY9V039222@chez.mckusick.com> In-Reply-To: <4C486209.7050402@aldan.algebra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Thu, 22 Jul 2010 11:21:45 -0400 > From: "Mikhail T." <mi+thun@aldan.algebra.com> > Organization: Virtual Estates, Inc. > To: Kirk McKusick <mckusick@mckusick.com> > CC: fs@freebsd.org > Subject: Re: background fsck considered harmful? > > 21.07.2010 23:35, Kirk McKusick > > Foreground fsck checks all the disk > > metadata every time, so hard disk errors are captured immediately > > before they have had a chance to accumulate. But background fsck > > users blame it because it has not found them. > > I don't blame the program itself -- if it was deliberately /designed/ to > only do partial checking. However, I was under the impression, that the > background fsck was meant to do the same job as the "real" one, and > that, whenever it did not, it was simply a bug in the /implementation/. > > I suspect, this misconception is shared by plenty of other users... > Indeed, even if a inquisitive admin wanted to find out, fsck(8) gives > absolutely no warning to that effect -- it simply states, that > background fsck will be attempted, whenever possible. > > > If you have small disk systems, running foreground fsck is an > > acceptable solution (and indeed I would recommend it). But when > > you are running systems with 20Tb of disks, you are not willing > > to have your system down for 10 hours after every crash. > > > > A reasonable intermediate solution is to use background fsck by > > default, but schedule down time to run a full fsck once a month > > or so to check for accumulated hard disk errors. > > Maybe, filesystems less than, say, 100Gb (default threshold, subject to > admin's adjustment) in size should always be foreground fsck-ed? This > should, at least, cover the system file-systems (such as / and /var) on > typical installations... If we did not have a better solution in the pipeline (journaled soft updates), I would agree with you that always doing a full check on small filesystems would be a useful enhancement. However, since we do have a solution that will work well for all sizes of filesystems in -current and expected out of the box with 9.0, I do not think that it would be useful to add this extra complexity at this time. > And a stern warning issued, when a background fsck is attempted -- for > whatever reason. Something like: > > background fsck, although faster, may be unable to detect certain > rare forms of filesystem corruption. You are advised to perform a > full fsck on %s on a regular basis. See fsck(8). > > should go into the right place under fsck_ffs/ -- not sure, where exactly... Since most folks do not look at the output from background fsck and with the changes noted above, I do not feel that adding this message would be all that helpful at this time. > Below is a simple patch for the top-level fsck(8). Somebody more > knowledgeable of the details should augment fsck_ffs(8) -- it currently > gives the lists of inconsistencies checked for without mentioning the > difference in coverage between full and background modes... > > diff -U 2 -r1.38.2.1 fsck.8 > --- fsck.8 3 Aug 2009 08:13:06 -0000 1.38.2.1 > +++ fsck.8 22 Jul 2010 15:19:25 -0000 > @@ -170,4 +170,12 @@ > When running in background mode, > only one file system at a time will be checked. > +.Sy Warning: > +because background fsck is performed while the filesystem > +is in use, it is limited to checking for only the most commonly > +occuring filesystem abnormalities. Under certain circumstances, > +some errors can escape background fsck. It is recommended, that you > +perform full fsck on your systems once in a while -- or whenever > +you encounter filesystem-related panics. > .It Fl t Ar fstype > Invoke > > Yours, > > -mi I concur that adding a note to fsck(8) would be a good idea as best practice is to run a full fsck after a disk-related panic. I would be happy with your checking in: diff -U 2 -r1.38.2.1 fsck.8 --- fsck.8 3 Aug 2009 08:13:06 -0000 1.38.2.1 +++ fsck.8 22 Jul 2010 15:19:25 -0000 @@ -170,4 +170,12 @@ When running in background mode, only one file system at a time will be checked. +.Sy Warning: +background fsck is limited to checking for only the most commonly +occuring filesystem abnormalities. Under certain circumstances, +some errors can escape background fsck. It is recommended, that you +perform full fsck on your systems once in a while -- or whenever +you encounter filesystem-related panics. .It Fl t Ar fstype Invoke Does this work for you? Kirk McKusick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201007221650.o6MGoY9V039222>