From owner-freebsd-current@FreeBSD.ORG  Sat Jan  7 18:10:23 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2F02216A41F
	for <freebsd-current@freebsd.org>; Sat,  7 Jan 2006 18:10:23 +0000 (GMT)
	(envelope-from sdrhodus@gmail.com)
Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.192])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 98BD643D55
	for <freebsd-current@freebsd.org>; Sat,  7 Jan 2006 18:10:21 +0000 (GMT)
	(envelope-from sdrhodus@gmail.com)
Received: by wproxy.gmail.com with SMTP id i32so3023457wra
	for <freebsd-current@freebsd.org>; Sat, 07 Jan 2006 10:10:21 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=aS1OV1LKuAokB3X0aE+2omryHUoaIfw/9miq+ypRs/6zwcIyIvRx5QS1Go20Lk63NBqi2p1+PotlxB6aZu/O4vz+ggynIc7f8C+PMCRvVuSh7iRk5Le/IhcQlrnHtisJ8qiXkcFSRnq6z7q/PlWGTmnHjpHiOP+wr+4jvOHwjzw=
Received: by 10.64.150.20 with SMTP id x20mr2070487qbd;
	Sat, 07 Jan 2006 10:10:20 -0800 (PST)
Received: by 10.64.178.4 with HTTP; Sat, 7 Jan 2006 10:10:20 -0800 (PST)
Message-ID: <fe77c96b0601071010g9c5827bj54bce94d45b37ea4@mail.gmail.com>
Date: Sat, 7 Jan 2006 13:10:20 -0500
From: David Rhodus <drhodus@machdep.com>
Sender: sdrhodus@gmail.com
To: Scott Long <scottl@samsco.org>
In-Reply-To: <43BFFE1D.4070502@samsco.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
References: <20060102222723.GA1754@dragon.NUXI.org>
	<43BA9C5C.9010307@samsco.org>
	<20060106200009.GA53067@garage.freebsd.pl>
	<43BFF041.8070300@samsco.org>
	<fe77c96b0601070904n57d00a21mdf94281bc812dc50@mail.gmail.com>
	<43BFFE1D.4070502@samsco.org>
Cc: freebsd-current@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: It still here... panic: ufs_dirbad: bad dir
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Jan 2006 18:10:23 -0000

On 1/7/06, Scott Long <scottl@samsco.org> wrote:
> David Rhodus wrote:
> > On 1/7/06, Scott Long <scottl@samsco.org> wrote:
> >
> >>Pawel Jakub Dawidek wrote:
> >>
> >>
> >>>On Tue, Jan 03, 2006 at 08:46:36AM -0700, Scott Long wrote:
> >>>+> David O'Brien wrote:
> >>>+>
> >>>+> >Just in case anyone thought the bug had been fixed...
> >>>+> >FreeBSD 7.0-CURRENT #531: Mon Jan  2 11:32:17 PST 2006 i386
> >>>+> >panic: ufs_dirbad: bad dir
> >>>+> >cpuid =3D 1
> >>>+> >KDB: stack backtrace:
> >>>+> >kdb_backtrace(c06c9ba1,1,c06c03c6,eae718c8,c8a91480) at 0xc053657e=
 =3D kdb_backtrace+0x2e
> >>>+> >panic(c06c03c6,c85bf1f8,dade11,580,c06c0380) at 0xc0516618 =3D pan=
ic+0x128
> >>>+> >ufs_dirbad(c9171bdc,580,c06c0380,0,eae7193c) at 0xc0616e4d =3D ufs=
_dirbad+0x4d
> >>>+> >ufs_lookup(eae719e8,c916c528,eae71bc4,c916c528,eae71a24) at 0xc061=
65cd =3D ufs_lookup+0x3ad
> >>>+> >VOP_CACHEDLOOKUP_APV(c06f2a80,eae719e8,eae71bc4,c8a91480,cac28d80)=
 at 0xc068cd4e =3D VOP_CACHEDLOOKUP_APV+0x9e
> >>>+> >vfs_cache_lookup(eae71a90,eae71a90,c916c528,c916c528,eae71bc4) at =
0xc057275a =3D vfs_cache_lookup+0xca
> >>>+> >VOP_LOOKUP_APV(c06f2a80,eae71a90,c8a91480,c106fc88,0) at 0xc068cc6=
6 =3D VOP_LOOKUP_APV+0xa6
> >>>+> >lookup(eae71b9c,0,c06b5c8e,b6,c057f7ed) at 0xc057760e =3D lookup+0=
x44e
> >>>+> >namei(eae71b9c,eae71b3c,60,0,c8a91480) at 0xc0576ecf =3D namei+0x4=
4f
> >>>+> >kern_stat(c8a91480,8106f20,0,eae71c10,e0) at 0xc05863dd =3D kern_s=
tat+0x3d
> >>>+> >stat(c8a91480,eae71d04,8,43c,c8a91480) at 0xc058636f =3D stat+0x2f
> >>>+> >syscall(3b,3b,3b,80dbe80,8106f20) at 0xc0682b43 =3D syscall+0x323
> >>>+> >Xint0x80_syscall() at 0xc066d33f =3D Xint0x80_syscall+0x1f
> >>>+>
> >>>+> Please include the console printf that is right about the panic mes=
sage.
> >>>+> It will say either something about a mangled entry or an isize too
> >>>+> small.  Since this problem is happening consistently for you, but t=
here
> >>>+> seem to be no other problem reports from others, I'd highly suspect=
 that
> >>>+> you have filesystem damage that isn't getting detected by fsck.  I =
assume that you are running fsck in the foreground and not in the backgroun=
d, yes?  The easiest solution
> >>>+> here might be to figure out which
> >>>+> directory is causing the problem, and just clri its inode and then =
clean
> >>>+> up the mess.
> >>>
> >>>I'm able to reproduce it with newly newfs(8)ed file system:
> >>>
> >>>/mnt: bad dir ino 17382405 at offset 0: mangled entry
> >>>panic: ufs_dirbad: bad dir
> >>>KDB: enter: panic
> >>>[...]
> >>>db> tr
> >>>Tracing pid 427 tid 100057 td 0xc7ccaa80
> >>>kdb_enter(c060029a,c065c020,c0610849,f6b228c0,100) at kdb_enter+0x30
> >>>panic(c0610849,c7914210,1093c05,0,c0610803) at panic+0xce
> >>>ufs_dirbad(cb2b4b58,0,c0610803,0,f6b22934) at ufs_dirbad+0x4e
> >>>ufs_lookup(f6b229e4,c061b519,cb092c60,cb092c60,f6b22b64) at ufs_lookup=
+0x39f
> >>>VOP_CACHEDLOOKUP_APV(c063a7e0,f6b229e4,f6b22b64,c7ccaa80,c7d52b80) at =
VOP_CACHEDLOOKUP_APV+0xc4
> >>>vfs_cache_lookup(f6b22a8c,f6b22a8c,0,cb092c60,0) at vfs_cache_lookup+0=
xc8
> >>>VOP_LOOKUP_APV(c063a7e0,f6b22a8c,c7ccaa80,38,0) at VOP_LOOKUP_APV+0xa6
> >>>lookup(f6b22b3c,0,c060880c,b5,c0511d45) at lookup+0x454
> >>>namei(f6b22b3c,f6b22b8c,60,0,c7ccaa80) at namei+0x441
> >>>kern_lstat(c7ccaa80,8059800,0,f6b22c10,2) at kern_lstat+0x5b
> >>>lstat(c7ccaa80,f6b22d04,8,43c,c065c740) at lstat+0x2f
> >>>syscall(805003b,807003b,bfbf003b,805f19c,bfbfeba0) at syscall+0x325
> >>>Xint0x80_syscall() at Xint0x80_syscall+0x1f
> >>>--- syscall (190, FreeBSD ELF32, lstat), eip =3D 0x28176efb, esp =3D 0=
xbfbfe90c, ebp =3D 0xbfbfea48 ---
> >>>
> >>
> >>Since you can reproduce it, can you find out which test it is failing?
> >>At the very least we need to add the test to fsck.
> >>
> >>Scott
> >
> >
> > The main problem with dirbad panics is that the corruption accrued a
> > long time ago, so a backtrace usually doesn't provide enough
> > information to find out what went wrong.
> >
> > Doing a fsck _should_ fix the filesystem corruption, but only after
> > the problem has already accrued.  There are a few cases in which fsck
> > needs to restart its current scan level or it can leave corruption
> > inside the filesystem while marking the partition clean.
> >
> > -DR
>
> Yes, I'm well aware of all of this, that's why I'm asking Pawel to
> determine which test is failing so we can find out why fsck isn't
> catching it.
>
> Scott

I think the problem in Pawels case is that the filesystem itself is
writing out corrupt data then later he's hitting a assertion when the
filesystem is trying to read the corrupt entry.  This seems to be a
problem with UFS itself.

As for fsck, it should fix this problem, but only after its already
happened and it may take two fsck scans.

I'm not sure what the current state of fsck is in fbsd, but one
problem I've noticed in the past while working on fbsd is that if fsck
has to create a lost+found directory it doesn't restart the current
scan level.  This can lead to the dirbad panic.

-DR