From owner-freebsd-current@FreeBSD.ORG  Wed Mar  1 20:10:43 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: FreeBSD-current@freebsd.org
Delivered-To: FreeBSD-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6355E16A420
	for <FreeBSD-current@freebsd.org>; Wed,  1 Mar 2006 20:10:43 +0000 (GMT)
	(envelope-from sdrhodus@gmail.com)
Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.201])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6C16143D45
	for <FreeBSD-current@freebsd.org>; Wed,  1 Mar 2006 20:10:40 +0000 (GMT)
	(envelope-from sdrhodus@gmail.com)
Received: by wproxy.gmail.com with SMTP id i23so230421wra
	for <FreeBSD-current@freebsd.org>; Wed, 01 Mar 2006 12:10:39 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=uP9lVzOqVKBFSme1hINBNHsM3ZRN+CwRL0/yw4vgn8yLZjEtxZ2XeqeVEqwtE95ksG9f0kt++rD51HyWGUDpvvPDHSrbwXvuosXQKiJcamaMkD2Pl9qgDdsVpME7fV3V1+jqKbRm3Xp8DQBr5+v/jtIhRacOTx3/s1on09JAw2c=
Received: by 10.65.43.11 with SMTP id v11mr297508qbj;
	Wed, 01 Mar 2006 12:10:38 -0800 (PST)
Received: by 10.64.178.5 with HTTP; Wed, 1 Mar 2006 12:10:38 -0800 (PST)
Message-ID: <fe77c96b0603011210w439e1d11xb82e3498c1846e65@mail.gmail.com>
Date: Wed, 1 Mar 2006 15:10:38 -0500
From: "David Rhodus" <drhodus@machdep.com>
Sender: sdrhodus@gmail.com
To: Yarema <yds@coolrat.org>
In-Reply-To: <3BD79FAD83E2122EC1644386@ramen.coolrat.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
References: <courier.44046DC8.000006A2@CoolRat.org>
	<20060228195343.GA85313@xor.obsecurity.org>
	<3BD79FAD83E2122EC1644386@ramen.coolrat.org>
X-Mailman-Approved-At: Wed, 01 Mar 2006 22:48:26 +0000
Cc: Dennis Koegel <amf@hobbit.neveragain.de>, FreeBSD-current@freebsd.org,
	Martin Machacek <m@m3a.net>, Kris Kennaway <kris@obsecurity.org>,
	Pawel Jakub Dawidek <pjd@freebsd.org>, FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/93942: panic: ufs_dirbad: bad dir
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Mar 2006 20:10:43 -0000

On 2/28/06, Yarema <yds@coolrat.org> wrote:
>
>
> --On February 28, 2006 2:53:43 PM -0500 Kris Kennaway <kris@obsecurity.or=
g>
> wrote:
>
> > On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote:
> >>
> >> > Number:         93942
> >> > Category:       kern
> >> > Synopsis:       panic: ufs_dirbad: bad dir
> >> > Confidential:   no
> >> > Severity:       critical
> >> > Priority:       high
> >> > Responsible:    freebsd-bugs
> >> > State:          open
> >> > Quarter:
> >> > Keywords:
> >> > Date-Required:
> >> > Class:          sw-bug
> >> > Submitter-Id:   current-users
> >> > Arrival-Date:   Tue Feb 28 15:40:06 GMT 2006
> >> > Closed-Date:
> >> > Last-Modified:
> >> > Originator:     Yarema <yds@CoolRat.org>
> >> > Release:        FreeBSD 6.1-PRERELEASE i386
> >> > Organization:
> >> > Environment:
> >> System: FreeBSD 6.1-PRERELEASE #0: Mon Feb 27 04:52:11 EST 2006 i386
> >>
> >> > Description:
> >>
> >> This is at least the third file system which got hosed for me by the
> >> ufs_dirbad bug on three different hard drives since 5.3 STABLE.
> >> I suspect this is related to the following PRs:
> >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D49079
> >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D51001
> >>
> >> In every case a process would lock up making the whole system
> >> unresponsive.  A reboot, fsck -y in single user mode and another
> >> reboot would produce the following during the mount of the corrupt
> >> fs in rw mode:
> >>
> >> bad dir ino 2 at  offset 16384: mangled entry
> >> panic: ufs_dirbad: bad dir
> >> cpuid =3D 0
> >>
> >> Another reboot, fsck -y in single user mode and reboot produces the
> >> same results repeatedly.  Previously I had recovered by mounting the
> >> corrupt fs in ro mode, backup, newfs, restore.
> >>
> >> Recently I noticed Matthew Dillon commit the following to the
> >> DragonFly src repository:
> >>
> >> http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html
> >>
> >> dillon      2006/02/21 10:46:56 PST
> >>
> >> DragonFly src repository
> >>
> >>   Modified files:
> >>     sys/kern             vfs_cluster.c
> >>   Log:
> >>   bioops.io_start() was being called in a situation where the buffer
> >>   could be brelse()'d afterwords instead of I/O being initiated.  When
> >>   this occurs, the buffer may contain softupdates-modified data which =
is
> >>   never reverted, resulting in serious filesystem corruption.  When
> >>   io_start is called on a buffer, I/O MUST be initiated and terminated
> >>   with a biodone() or the buffer's data may not be properly reverted.
> >>
> >>   Solve the problem by moving the io_start() call a little further on =
in
> >>   the code, after the potential brelse().
> >>
> >>   There is a possibility that this bug is responsible for the 'dirbad'
> >>   panics often reported in DragonFly and FreeBSD circles.
> >>
> >>   Revision  Changes    Path
> >>   1.16      +7 -6      src/sys/kern/vfs_cluster.c
> >>
> >> http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1=
=3D1.
> >> 15&r2=3D1.16&f=3Du
> >>
> >> Below is the equivalent patch to the FreeBSD RELENG_6 branch of
> >> src/sys/kern/vfs_cluster.c
> >>
> >> Hope this helps track down the problem.
> >
> > Does it work for you? :)
> >
> > Kris
>
> No way for me to know yet.  From what I gathered, mostly from this thread=
:
> <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=3D331058+0+archive/2006/fre=
ebsd-current/20060108.freebsd-current>
>
> As per Matt Dillon
> <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=3D217892+0+/usr/local/www/d=
b/text/2006/freebsd-current/20060226.freebsd-current>,
> the corruption occurs much earlier than any consequences can be felt.
> The patch may prevent the corruption from occurring in the first place.
> But the patch does nothing for me now that I have a huge /home slice
> which cannot even be mounted as read-only in single user mode without
> triggering a page fault kernel panic in the mount process no matter
> how many times I run fsck -f on it.
>
> FWIW the page fault in the mount process is a different sort of kernel
> panic than what is described in this kern/93942 PR above.  The page fault
> occurs while attempting to mount read-only.  Attempting to mount raed-wri=
te
> causes the panic: ufs_dirbad: bad dir
>
> One more note, hitting the power button when the machine is locked up
> before the reboot and mount attempt which causes the panic produces the
> following output every time the button is pressed:
>
> kernel: acpi: suspend request ignored (not ready yet)
>
> Seems like there's two separate problems:
> 1) the root cause of the bad dir corruption.
> 2) fsck -f doesn't fix it no matter how many times you run it.
>
> Any pointers on how to recover my /home slice will be greatly appreciated=
.
>
> --
> Yarema

I have been working with the bad dir problem for several months and I
have not had corruption which fsck would not correct.


-DR