Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 May 2003 08:54:05 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        "Daniel C. Sobral" <dcs@tcoip.com.br>
Cc:        mckusick@FreeBSD.ORG
Subject:   Re: panic: lockmgr: locking against myself
Message-ID:  <3ECA4F9D.26DFA3DE@mindspring.com>
References:  <3ECA15DB.7020107@tcoip.com.br>

next in thread | previous in thread | raw e-mail | index | archive | help
"Daniel C. Sobral" wrote:
> This is an easily reproducable panic. Whenever my system panics for some
> reason, that panic will follow during the background fsck afterwards.
> 
> I'm inclined to disable background fsck, only that would be just masking
> the problem, not solving it. I *thought* this had been solved. :-(

No.  I suggested a counter to force a FG fsck; someone claimed it
was already implemented, I looked, it wasn't, and it would take a
superblock change (you would need a new "don't care about mismatch"
region; wish people had put in the space reservation API I had
suggested, when they were rewriting UFS to UFS2...).


> When this stuff happens, I always boot single user and run fsck by hand,
> so I can't see how it could be inconsistent state left by an early panic
> _during_ a background fsck.

If you crash with unwritten buffers because write caching is
enabled on an ATA drive, or you end up trashing a few sectors
or a track because of a genuine power loss, or because your
ACPI settings don't delay "shutdown -p" until after a drive
flush so that the write cache is guaranteed to be syned to
disk, soft updates will not guarantee drive consistency (it's
kind of hard to do if your power is off).

In that case, the FS can have data structures SPAM'med by the
poor design, and the only way to recover is a full fsck.  The
basic assumptions with the BG fsck is that the only thing that
could be SPAM'med is the cylinder group bitmaps... and in the
above cases, that's not true.  Any panic you get will *usually*
be the result of taking the snapshot for the BG fsck, or as a
result of mounting and using the disk as if it were safe to do,
when it's not.

If you can adjust your ACPI settings to ensure that the power
doesn't actually go off until the disks write cache is flushed,
and verify that there is a request being posted for the write
cache to be flushed prior to a power-down or sleep event, then
that should "resolve" your problem, unless you are getting a
genuine external power failure.

Really, the correct thing to do is set a counter in the FS
superblock at clean shutdown (e.g. a "tunefs" option would do
the trick), and decrement it each time you start a BG fsck.
When it gets to 0, you force a full fsck instead.  Basically,
this would mean you panic at most 3 times before the FG fsck
takes over and cleans up the BG fsck-unrecoverable problem
for you.

Short of writing that code, your only chance is to keep the
data on the disk from being bogofied in the first place (per
avoiding the situations in the first paragraph, above).

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3ECA4F9D.26DFA3DE>