Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Aug 2007 15:02:24 +1000 (EST)
From:      Ian Smith <smithi@nimnet.asn.au>
To:        Chris <chrcoluk@gmail.com>
Cc:        freebsd-questions@freebsd.org, Bill Moran <wmoran@potentialtech.com>
Subject:   Re: fsck strangeness
Message-ID:  <Pine.BSF.3.96.1070823144008.26941B-100000@gaia.nimnet.asn.au>
In-Reply-To: <3aaaa3a0708220716m5601bb4ewc8688225291ae7bd@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 22 Aug 2007, Chris wrote:
 > On 20/08/07, Ian Smith <smithi@nimnet.asn.au> wrote:
 > > Sorry for the repeat post folks, but I goofed last time, leaving out the
 > > subject line while replying to the digest.  Still curious .. Ian
 > > =======
 > >
 > > On Sat, 18 Aug 2007 21:32:28 +0200 Erik Trulsson <ertr1013@student.uu.se> wrote:
 > >  > On Sat, Aug 18, 2007 at 08:21:42PM +0100, Christopher Key wrote:
 > >  > > Hello,
 > >  > >
 > >  > > I'm having some rather strange behaviour with fsck.
 > >  > >
 > >  > > When I boot the system, it asserts that all the file systems are clean, but
 > >  > > subsequently running an fsck on /dev/ad8s1e (mounted as /var) detects
 > >  > > errors.  Even if this first check is run whilst the file system is mounted,
 > >  > > and is hence run in NO WRITE mode, a second check doesn't find block
 > >  > > errors.  If I then unmount the file system and check the disk, it's fine,
 > >  > > as indeed it is if I unmount, remount, then check.  However, if I then
 > >  > > reboot, the process repeats, and an fsck immediately after reboot will find
 > >  > > errors again.  If I bring the system up in single user mode, and run fsck
 > >  > > either before or after mounting /var, it finds no errors.
 > >  > >
 > >  > > I'm running 6.2_RELEASE with a custom kernel based upon generic-smp, but
 > >  > > with a lot of unecessary bits removed, and geom_mirror compiled in.  I
 > >  > > don't think it's the drive that's at fault, all the other partitions in the
 > >  > > slice are fine, it's a fairly new drive, and it passes a self test quite
 > >  > > happily.  Included below is a transcript that attempt to show what's going
 > >  > > on in detail, is there anything else relevant?
 > >  > >
 > >  > > Can anyone suggest what might be going on and how to fix it, or suggest
 > >  > > some slightly better diagnostics?  Apologies if this is an RTFM issue, I
 > >  > > have had a good dig through the handbook, but can't seem to find anything
 > >  > > that helps.
 > >
 > >  > Running fsck on a file system that has been mounted read/write will almost
 > >  > always report spurious errors and can really screw up the disk if it tries
 > >  > to 'correct' those errors.
 > >
 > > I'm a bit confused by this.  I've been running 'fsck -n' over FreeBSD
 > > systems since 2.2.6, and modulo seeing the at-the-time inconsistencies
 > > on those filesystems in /etc/fstab that are mounted, as Chris reported
 > > and as are expected, I've never had a problem with it, nor seen the sort
 > > of inconsistent results between runs that Chris is reporting.
 > >
 > >  > You should normally not run fsck on a mounted filesystem and you should
 > >  > *NEVER* run fsck on a filesystem that has been mounted read/write.
 > >
 > > This seems to imply that using the -n switch may have different results
 > > than not using it and having fsck determine 'NO WRITE' itself from the
 > > fact that it's noticed that the fs is mounted?  Are you suggesting by
 > > "can really screw up the disk if it tries to 'correct' those errors"
 > > that fsck might WRITE to a mounted fs that it's showing as 'NO WRITE'?
 > >
 > > I've never had any screwups with it, but then I've always specified -n.
 > >
 > > Later Bill Moran said:
 > >
 > >  > Don't run fsck on mounted filesystems unless they're mounted read-only.
 > >  >
 > >  > Although, it's possible I misunderstood your description of the problem.
 > >
 > > so I'm still curious, and am wondering if Chris using SMP kernel and/or
 > > geom_mirror might have anything to do with this?  Or whether his use of
 > > 'umount -f' might be (or cause) the problem indicated by his results?
 > >
 > >  > > # umount -f /var
 > >  > >
 > >  > >
 > >  > > # mount /var

 > If its bad to run fsck on a mounted read,write then why does
 > background fsck do it? or you talking about foreground fsck only?

Well I was referring to foreground fsck, and I still don't know why
running it on a mounted fs is 'bad' when fsck runs in 'NO WRITE' mode
anyway when it finds a fs is mounted, hence my query above.

My knowledge of this is thin, despite reading McKusick's paper through
several times, but we're told that background fsck runs on a snapshot of
the fs concerned.  How any bg fsck corrections are woven back into the
live fs later is still a mystery to me, but that's because I still have
an only barely superficial understanding of how snapshots work ..

I still feel that your 'umount -f /var' seems potentially hairy, but
can't say if that might explain the behaviour you were reporting.

Cheers, Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.1070823144008.26941B-100000>