Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 07 Jan 2006 10:45:01 -0700
From:      Scott Long <scottl@samsco.org>
To:        David Rhodus <drhodus@machdep.com>
Cc:        freebsd-current@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject:   Re: It still here... panic: ufs_dirbad: bad dir
Message-ID:  <43BFFE1D.4070502@samsco.org>
In-Reply-To: <fe77c96b0601070904n57d00a21mdf94281bc812dc50@mail.gmail.com>
References:  <20060102222723.GA1754@dragon.NUXI.org>	 <43BA9C5C.9010307@samsco.org>	 <20060106200009.GA53067@garage.freebsd.pl>	 <43BFF041.8070300@samsco.org> <fe77c96b0601070904n57d00a21mdf94281bc812dc50@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
David Rhodus wrote:
> On 1/7/06, Scott Long <scottl@samsco.org> wrote:
> 
>>Pawel Jakub Dawidek wrote:
>>
>>
>>>On Tue, Jan 03, 2006 at 08:46:36AM -0700, Scott Long wrote:
>>>+> David O'Brien wrote:
>>>+>
>>>+> >Just in case anyone thought the bug had been fixed...
>>>+> >FreeBSD 7.0-CURRENT #531: Mon Jan  2 11:32:17 PST 2006 i386
>>>+> >panic: ufs_dirbad: bad dir
>>>+> >cpuid = 1
>>>+> >KDB: stack backtrace:
>>>+> >kdb_backtrace(c06c9ba1,1,c06c03c6,eae718c8,c8a91480) at 0xc053657e = kdb_backtrace+0x2e
>>>+> >panic(c06c03c6,c85bf1f8,dade11,580,c06c0380) at 0xc0516618 = panic+0x128
>>>+> >ufs_dirbad(c9171bdc,580,c06c0380,0,eae7193c) at 0xc0616e4d = ufs_dirbad+0x4d
>>>+> >ufs_lookup(eae719e8,c916c528,eae71bc4,c916c528,eae71a24) at 0xc06165cd = ufs_lookup+0x3ad
>>>+> >VOP_CACHEDLOOKUP_APV(c06f2a80,eae719e8,eae71bc4,c8a91480,cac28d80) at 0xc068cd4e = VOP_CACHEDLOOKUP_APV+0x9e
>>>+> >vfs_cache_lookup(eae71a90,eae71a90,c916c528,c916c528,eae71bc4) at 0xc057275a = vfs_cache_lookup+0xca
>>>+> >VOP_LOOKUP_APV(c06f2a80,eae71a90,c8a91480,c106fc88,0) at 0xc068cc66 = VOP_LOOKUP_APV+0xa6
>>>+> >lookup(eae71b9c,0,c06b5c8e,b6,c057f7ed) at 0xc057760e = lookup+0x44e
>>>+> >namei(eae71b9c,eae71b3c,60,0,c8a91480) at 0xc0576ecf = namei+0x44f
>>>+> >kern_stat(c8a91480,8106f20,0,eae71c10,e0) at 0xc05863dd = kern_stat+0x3d
>>>+> >stat(c8a91480,eae71d04,8,43c,c8a91480) at 0xc058636f = stat+0x2f
>>>+> >syscall(3b,3b,3b,80dbe80,8106f20) at 0xc0682b43 = syscall+0x323
>>>+> >Xint0x80_syscall() at 0xc066d33f = Xint0x80_syscall+0x1f
>>>+>
>>>+> Please include the console printf that is right about the panic message.
>>>+> It will say either something about a mangled entry or an isize too
>>>+> small.  Since this problem is happening consistently for you, but there
>>>+> seem to be no other problem reports from others, I'd highly suspect that
>>>+> you have filesystem damage that isn't getting detected by fsck.  I assume that you are running fsck in the foreground and not in the background, yes?  The easiest solution
>>>+> here might be to figure out which
>>>+> directory is causing the problem, and just clri its inode and then clean
>>>+> up the mess.
>>>
>>>I'm able to reproduce it with newly newfs(8)ed file system:
>>>
>>>/mnt: bad dir ino 17382405 at offset 0: mangled entry
>>>panic: ufs_dirbad: bad dir
>>>KDB: enter: panic
>>>[...]
>>>db> tr
>>>Tracing pid 427 tid 100057 td 0xc7ccaa80
>>>kdb_enter(c060029a,c065c020,c0610849,f6b228c0,100) at kdb_enter+0x30
>>>panic(c0610849,c7914210,1093c05,0,c0610803) at panic+0xce
>>>ufs_dirbad(cb2b4b58,0,c0610803,0,f6b22934) at ufs_dirbad+0x4e
>>>ufs_lookup(f6b229e4,c061b519,cb092c60,cb092c60,f6b22b64) at ufs_lookup+0x39f
>>>VOP_CACHEDLOOKUP_APV(c063a7e0,f6b229e4,f6b22b64,c7ccaa80,c7d52b80) at VOP_CACHEDLOOKUP_APV+0xc4
>>>vfs_cache_lookup(f6b22a8c,f6b22a8c,0,cb092c60,0) at vfs_cache_lookup+0xc8
>>>VOP_LOOKUP_APV(c063a7e0,f6b22a8c,c7ccaa80,38,0) at VOP_LOOKUP_APV+0xa6
>>>lookup(f6b22b3c,0,c060880c,b5,c0511d45) at lookup+0x454
>>>namei(f6b22b3c,f6b22b8c,60,0,c7ccaa80) at namei+0x441
>>>kern_lstat(c7ccaa80,8059800,0,f6b22c10,2) at kern_lstat+0x5b
>>>lstat(c7ccaa80,f6b22d04,8,43c,c065c740) at lstat+0x2f
>>>syscall(805003b,807003b,bfbf003b,805f19c,bfbfeba0) at syscall+0x325
>>>Xint0x80_syscall() at Xint0x80_syscall+0x1f
>>>--- syscall (190, FreeBSD ELF32, lstat), eip = 0x28176efb, esp = 0xbfbfe90c, ebp = 0xbfbfea48 ---
>>>
>>
>>Since you can reproduce it, can you find out which test it is failing?
>>At the very least we need to add the test to fsck.
>>
>>Scott
> 
> 
> The main problem with dirbad panics is that the corruption accrued a
> long time ago, so a backtrace usually doesn't provide enough
> information to find out what went wrong.
> 
> Doing a fsck _should_ fix the filesystem corruption, but only after
> the problem has already accrued.  There are a few cases in which fsck
> needs to restart its current scan level or it can leave corruption
> inside the filesystem while marking the partition clean.
> 
> -DR

Yes, I'm well aware of all of this, that's why I'm asking Pawel to
determine which test is failing so we can find out why fsck isn't
catching it.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43BFFE1D.4070502>