Date: Wed, 1 Mar 2006 15:10:38 -0500 From: "David Rhodus" <drhodus@machdep.com> To: Yarema <yds@coolrat.org> Cc: Dennis Koegel <amf@hobbit.neveragain.de>, FreeBSD-current@freebsd.org, Martin Machacek <m@m3a.net>, Kris Kennaway <kris@obsecurity.org>, Pawel Jakub Dawidek <pjd@freebsd.org>, FreeBSD-gnats-submit@freebsd.org Subject: Re: kern/93942: panic: ufs_dirbad: bad dir Message-ID: <fe77c96b0603011210w439e1d11xb82e3498c1846e65@mail.gmail.com> In-Reply-To: <3BD79FAD83E2122EC1644386@ramen.coolrat.org> References: <courier.44046DC8.000006A2@CoolRat.org> <20060228195343.GA85313@xor.obsecurity.org> <3BD79FAD83E2122EC1644386@ramen.coolrat.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2/28/06, Yarema <yds@coolrat.org> wrote: > > > --On February 28, 2006 2:53:43 PM -0500 Kris Kennaway <kris@obsecurity.or= g> > wrote: > > > On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote: > >> > >> > Number: 93942 > >> > Category: kern > >> > Synopsis: panic: ufs_dirbad: bad dir > >> > Confidential: no > >> > Severity: critical > >> > Priority: high > >> > Responsible: freebsd-bugs > >> > State: open > >> > Quarter: > >> > Keywords: > >> > Date-Required: > >> > Class: sw-bug > >> > Submitter-Id: current-users > >> > Arrival-Date: Tue Feb 28 15:40:06 GMT 2006 > >> > Closed-Date: > >> > Last-Modified: > >> > Originator: Yarema <yds@CoolRat.org> > >> > Release: FreeBSD 6.1-PRERELEASE i386 > >> > Organization: > >> > Environment: > >> System: FreeBSD 6.1-PRERELEASE #0: Mon Feb 27 04:52:11 EST 2006 i386 > >> > >> > Description: > >> > >> This is at least the third file system which got hosed for me by the > >> ufs_dirbad bug on three different hard drives since 5.3 STABLE. > >> I suspect this is related to the following PRs: > >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D49079 > >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D51001 > >> > >> In every case a process would lock up making the whole system > >> unresponsive. A reboot, fsck -y in single user mode and another > >> reboot would produce the following during the mount of the corrupt > >> fs in rw mode: > >> > >> bad dir ino 2 at offset 16384: mangled entry > >> panic: ufs_dirbad: bad dir > >> cpuid =3D 0 > >> > >> Another reboot, fsck -y in single user mode and reboot produces the > >> same results repeatedly. Previously I had recovered by mounting the > >> corrupt fs in ro mode, backup, newfs, restore. > >> > >> Recently I noticed Matthew Dillon commit the following to the > >> DragonFly src repository: > >> > >> http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html > >> > >> dillon 2006/02/21 10:46:56 PST > >> > >> DragonFly src repository > >> > >> Modified files: > >> sys/kern vfs_cluster.c > >> Log: > >> bioops.io_start() was being called in a situation where the buffer > >> could be brelse()'d afterwords instead of I/O being initiated. When > >> this occurs, the buffer may contain softupdates-modified data which = is > >> never reverted, resulting in serious filesystem corruption. When > >> io_start is called on a buffer, I/O MUST be initiated and terminated > >> with a biodone() or the buffer's data may not be properly reverted. > >> > >> Solve the problem by moving the io_start() call a little further on = in > >> the code, after the potential brelse(). > >> > >> There is a possibility that this bug is responsible for the 'dirbad' > >> panics often reported in DragonFly and FreeBSD circles. > >> > >> Revision Changes Path > >> 1.16 +7 -6 src/sys/kern/vfs_cluster.c > >> > >> http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1= =3D1. > >> 15&r2=3D1.16&f=3Du > >> > >> Below is the equivalent patch to the FreeBSD RELENG_6 branch of > >> src/sys/kern/vfs_cluster.c > >> > >> Hope this helps track down the problem. > > > > Does it work for you? :) > > > > Kris > > No way for me to know yet. From what I gathered, mostly from this thread= : > <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=3D331058+0+archive/2006/fre= ebsd-current/20060108.freebsd-current> > > As per Matt Dillon > <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=3D217892+0+/usr/local/www/d= b/text/2006/freebsd-current/20060226.freebsd-current>, > the corruption occurs much earlier than any consequences can be felt. > The patch may prevent the corruption from occurring in the first place. > But the patch does nothing for me now that I have a huge /home slice > which cannot even be mounted as read-only in single user mode without > triggering a page fault kernel panic in the mount process no matter > how many times I run fsck -f on it. > > FWIW the page fault in the mount process is a different sort of kernel > panic than what is described in this kern/93942 PR above. The page fault > occurs while attempting to mount read-only. Attempting to mount raed-wri= te > causes the panic: ufs_dirbad: bad dir > > One more note, hitting the power button when the machine is locked up > before the reboot and mount attempt which causes the panic produces the > following output every time the button is pressed: > > kernel: acpi: suspend request ignored (not ready yet) > > Seems like there's two separate problems: > 1) the root cause of the bad dir corruption. > 2) fsck -f doesn't fix it no matter how many times you run it. > > Any pointers on how to recover my /home slice will be greatly appreciated= . > > -- > Yarema I have been working with the bad dir problem for several months and I have not had corruption which fsck would not correct. -DR
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fe77c96b0603011210w439e1d11xb82e3498c1846e65>