Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Mar 2006 15:10:38 -0500
From:      "David Rhodus" <drhodus@machdep.com>
To:        Yarema <yds@coolrat.org>
Cc:        Dennis Koegel <amf@hobbit.neveragain.de>, FreeBSD-current@freebsd.org, Martin Machacek <m@m3a.net>, Kris Kennaway <kris@obsecurity.org>, Pawel Jakub Dawidek <pjd@freebsd.org>, FreeBSD-gnats-submit@freebsd.org
Subject:   Re: kern/93942: panic: ufs_dirbad: bad dir
Message-ID:  <fe77c96b0603011210w439e1d11xb82e3498c1846e65@mail.gmail.com>
In-Reply-To: <3BD79FAD83E2122EC1644386@ramen.coolrat.org>
References:  <courier.44046DC8.000006A2@CoolRat.org> <20060228195343.GA85313@xor.obsecurity.org> <3BD79FAD83E2122EC1644386@ramen.coolrat.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/28/06, Yarema <yds@coolrat.org> wrote:
>
>
> --On February 28, 2006 2:53:43 PM -0500 Kris Kennaway <kris@obsecurity.or=
g>
> wrote:
>
> > On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote:
> >>
> >> > Number:         93942
> >> > Category:       kern
> >> > Synopsis:       panic: ufs_dirbad: bad dir
> >> > Confidential:   no
> >> > Severity:       critical
> >> > Priority:       high
> >> > Responsible:    freebsd-bugs
> >> > State:          open
> >> > Quarter:
> >> > Keywords:
> >> > Date-Required:
> >> > Class:          sw-bug
> >> > Submitter-Id:   current-users
> >> > Arrival-Date:   Tue Feb 28 15:40:06 GMT 2006
> >> > Closed-Date:
> >> > Last-Modified:
> >> > Originator:     Yarema <yds@CoolRat.org>
> >> > Release:        FreeBSD 6.1-PRERELEASE i386
> >> > Organization:
> >> > Environment:
> >> System: FreeBSD 6.1-PRERELEASE #0: Mon Feb 27 04:52:11 EST 2006 i386
> >>
> >> > Description:
> >>
> >> This is at least the third file system which got hosed for me by the
> >> ufs_dirbad bug on three different hard drives since 5.3 STABLE.
> >> I suspect this is related to the following PRs:
> >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D49079
> >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D51001
> >>
> >> In every case a process would lock up making the whole system
> >> unresponsive.  A reboot, fsck -y in single user mode and another
> >> reboot would produce the following during the mount of the corrupt
> >> fs in rw mode:
> >>
> >> bad dir ino 2 at  offset 16384: mangled entry
> >> panic: ufs_dirbad: bad dir
> >> cpuid =3D 0
> >>
> >> Another reboot, fsck -y in single user mode and reboot produces the
> >> same results repeatedly.  Previously I had recovered by mounting the
> >> corrupt fs in ro mode, backup, newfs, restore.
> >>
> >> Recently I noticed Matthew Dillon commit the following to the
> >> DragonFly src repository:
> >>
> >> http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html
> >>
> >> dillon      2006/02/21 10:46:56 PST
> >>
> >> DragonFly src repository
> >>
> >>   Modified files:
> >>     sys/kern             vfs_cluster.c
> >>   Log:
> >>   bioops.io_start() was being called in a situation where the buffer
> >>   could be brelse()'d afterwords instead of I/O being initiated.  When
> >>   this occurs, the buffer may contain softupdates-modified data which =
is
> >>   never reverted, resulting in serious filesystem corruption.  When
> >>   io_start is called on a buffer, I/O MUST be initiated and terminated
> >>   with a biodone() or the buffer's data may not be properly reverted.
> >>
> >>   Solve the problem by moving the io_start() call a little further on =
in
> >>   the code, after the potential brelse().
> >>
> >>   There is a possibility that this bug is responsible for the 'dirbad'
> >>   panics often reported in DragonFly and FreeBSD circles.
> >>
> >>   Revision  Changes    Path
> >>   1.16      +7 -6      src/sys/kern/vfs_cluster.c
> >>
> >> http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1=
=3D1.
> >> 15&r2=3D1.16&f=3Du
> >>
> >> Below is the equivalent patch to the FreeBSD RELENG_6 branch of
> >> src/sys/kern/vfs_cluster.c
> >>
> >> Hope this helps track down the problem.
> >
> > Does it work for you? :)
> >
> > Kris
>
> No way for me to know yet.  From what I gathered, mostly from this thread=
:
> <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=3D331058+0+archive/2006/fre=
ebsd-current/20060108.freebsd-current>
>
> As per Matt Dillon
> <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=3D217892+0+/usr/local/www/d=
b/text/2006/freebsd-current/20060226.freebsd-current>,
> the corruption occurs much earlier than any consequences can be felt.
> The patch may prevent the corruption from occurring in the first place.
> But the patch does nothing for me now that I have a huge /home slice
> which cannot even be mounted as read-only in single user mode without
> triggering a page fault kernel panic in the mount process no matter
> how many times I run fsck -f on it.
>
> FWIW the page fault in the mount process is a different sort of kernel
> panic than what is described in this kern/93942 PR above.  The page fault
> occurs while attempting to mount read-only.  Attempting to mount raed-wri=
te
> causes the panic: ufs_dirbad: bad dir
>
> One more note, hitting the power button when the machine is locked up
> before the reboot and mount attempt which causes the panic produces the
> following output every time the button is pressed:
>
> kernel: acpi: suspend request ignored (not ready yet)
>
> Seems like there's two separate problems:
> 1) the root cause of the bad dir corruption.
> 2) fsck -f doesn't fix it no matter how many times you run it.
>
> Any pointers on how to recover my /home slice will be greatly appreciated=
.
>
> --
> Yarema

I have been working with the bad dir problem for several months and I
have not had corruption which fsck would not correct.


-DR



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fe77c96b0603011210w439e1d11xb82e3498c1846e65>