From owner-freebsd-current@FreeBSD.ORG Sat Jan 7 18:10:23 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2F02216A41F for ; Sat, 7 Jan 2006 18:10:23 +0000 (GMT) (envelope-from sdrhodus@gmail.com) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.192]) by mx1.FreeBSD.org (Postfix) with ESMTP id 98BD643D55 for ; Sat, 7 Jan 2006 18:10:21 +0000 (GMT) (envelope-from sdrhodus@gmail.com) Received: by wproxy.gmail.com with SMTP id i32so3023457wra for ; Sat, 07 Jan 2006 10:10:21 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=aS1OV1LKuAokB3X0aE+2omryHUoaIfw/9miq+ypRs/6zwcIyIvRx5QS1Go20Lk63NBqi2p1+PotlxB6aZu/O4vz+ggynIc7f8C+PMCRvVuSh7iRk5Le/IhcQlrnHtisJ8qiXkcFSRnq6z7q/PlWGTmnHjpHiOP+wr+4jvOHwjzw= Received: by 10.64.150.20 with SMTP id x20mr2070487qbd; Sat, 07 Jan 2006 10:10:20 -0800 (PST) Received: by 10.64.178.4 with HTTP; Sat, 7 Jan 2006 10:10:20 -0800 (PST) Message-ID: Date: Sat, 7 Jan 2006 13:10:20 -0500 From: David Rhodus Sender: sdrhodus@gmail.com To: Scott Long In-Reply-To: <43BFFE1D.4070502@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <20060102222723.GA1754@dragon.NUXI.org> <43BA9C5C.9010307@samsco.org> <20060106200009.GA53067@garage.freebsd.pl> <43BFF041.8070300@samsco.org> <43BFFE1D.4070502@samsco.org> Cc: freebsd-current@freebsd.org, Pawel Jakub Dawidek Subject: Re: It still here... panic: ufs_dirbad: bad dir X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Jan 2006 18:10:23 -0000 On 1/7/06, Scott Long wrote: > David Rhodus wrote: > > On 1/7/06, Scott Long wrote: > > > >>Pawel Jakub Dawidek wrote: > >> > >> > >>>On Tue, Jan 03, 2006 at 08:46:36AM -0700, Scott Long wrote: > >>>+> David O'Brien wrote: > >>>+> > >>>+> >Just in case anyone thought the bug had been fixed... > >>>+> >FreeBSD 7.0-CURRENT #531: Mon Jan 2 11:32:17 PST 2006 i386 > >>>+> >panic: ufs_dirbad: bad dir > >>>+> >cpuid =3D 1 > >>>+> >KDB: stack backtrace: > >>>+> >kdb_backtrace(c06c9ba1,1,c06c03c6,eae718c8,c8a91480) at 0xc053657e= =3D kdb_backtrace+0x2e > >>>+> >panic(c06c03c6,c85bf1f8,dade11,580,c06c0380) at 0xc0516618 =3D pan= ic+0x128 > >>>+> >ufs_dirbad(c9171bdc,580,c06c0380,0,eae7193c) at 0xc0616e4d =3D ufs= _dirbad+0x4d > >>>+> >ufs_lookup(eae719e8,c916c528,eae71bc4,c916c528,eae71a24) at 0xc061= 65cd =3D ufs_lookup+0x3ad > >>>+> >VOP_CACHEDLOOKUP_APV(c06f2a80,eae719e8,eae71bc4,c8a91480,cac28d80)= at 0xc068cd4e =3D VOP_CACHEDLOOKUP_APV+0x9e > >>>+> >vfs_cache_lookup(eae71a90,eae71a90,c916c528,c916c528,eae71bc4) at = 0xc057275a =3D vfs_cache_lookup+0xca > >>>+> >VOP_LOOKUP_APV(c06f2a80,eae71a90,c8a91480,c106fc88,0) at 0xc068cc6= 6 =3D VOP_LOOKUP_APV+0xa6 > >>>+> >lookup(eae71b9c,0,c06b5c8e,b6,c057f7ed) at 0xc057760e =3D lookup+0= x44e > >>>+> >namei(eae71b9c,eae71b3c,60,0,c8a91480) at 0xc0576ecf =3D namei+0x4= 4f > >>>+> >kern_stat(c8a91480,8106f20,0,eae71c10,e0) at 0xc05863dd =3D kern_s= tat+0x3d > >>>+> >stat(c8a91480,eae71d04,8,43c,c8a91480) at 0xc058636f =3D stat+0x2f > >>>+> >syscall(3b,3b,3b,80dbe80,8106f20) at 0xc0682b43 =3D syscall+0x323 > >>>+> >Xint0x80_syscall() at 0xc066d33f =3D Xint0x80_syscall+0x1f > >>>+> > >>>+> Please include the console printf that is right about the panic mes= sage. > >>>+> It will say either something about a mangled entry or an isize too > >>>+> small. Since this problem is happening consistently for you, but t= here > >>>+> seem to be no other problem reports from others, I'd highly suspect= that > >>>+> you have filesystem damage that isn't getting detected by fsck. I = assume that you are running fsck in the foreground and not in the backgroun= d, yes? The easiest solution > >>>+> here might be to figure out which > >>>+> directory is causing the problem, and just clri its inode and then = clean > >>>+> up the mess. > >>> > >>>I'm able to reproduce it with newly newfs(8)ed file system: > >>> > >>>/mnt: bad dir ino 17382405 at offset 0: mangled entry > >>>panic: ufs_dirbad: bad dir > >>>KDB: enter: panic > >>>[...] > >>>db> tr > >>>Tracing pid 427 tid 100057 td 0xc7ccaa80 > >>>kdb_enter(c060029a,c065c020,c0610849,f6b228c0,100) at kdb_enter+0x30 > >>>panic(c0610849,c7914210,1093c05,0,c0610803) at panic+0xce > >>>ufs_dirbad(cb2b4b58,0,c0610803,0,f6b22934) at ufs_dirbad+0x4e > >>>ufs_lookup(f6b229e4,c061b519,cb092c60,cb092c60,f6b22b64) at ufs_lookup= +0x39f > >>>VOP_CACHEDLOOKUP_APV(c063a7e0,f6b229e4,f6b22b64,c7ccaa80,c7d52b80) at = VOP_CACHEDLOOKUP_APV+0xc4 > >>>vfs_cache_lookup(f6b22a8c,f6b22a8c,0,cb092c60,0) at vfs_cache_lookup+0= xc8 > >>>VOP_LOOKUP_APV(c063a7e0,f6b22a8c,c7ccaa80,38,0) at VOP_LOOKUP_APV+0xa6 > >>>lookup(f6b22b3c,0,c060880c,b5,c0511d45) at lookup+0x454 > >>>namei(f6b22b3c,f6b22b8c,60,0,c7ccaa80) at namei+0x441 > >>>kern_lstat(c7ccaa80,8059800,0,f6b22c10,2) at kern_lstat+0x5b > >>>lstat(c7ccaa80,f6b22d04,8,43c,c065c740) at lstat+0x2f > >>>syscall(805003b,807003b,bfbf003b,805f19c,bfbfeba0) at syscall+0x325 > >>>Xint0x80_syscall() at Xint0x80_syscall+0x1f > >>>--- syscall (190, FreeBSD ELF32, lstat), eip =3D 0x28176efb, esp =3D 0= xbfbfe90c, ebp =3D 0xbfbfea48 --- > >>> > >> > >>Since you can reproduce it, can you find out which test it is failing? > >>At the very least we need to add the test to fsck. > >> > >>Scott > > > > > > The main problem with dirbad panics is that the corruption accrued a > > long time ago, so a backtrace usually doesn't provide enough > > information to find out what went wrong. > > > > Doing a fsck _should_ fix the filesystem corruption, but only after > > the problem has already accrued. There are a few cases in which fsck > > needs to restart its current scan level or it can leave corruption > > inside the filesystem while marking the partition clean. > > > > -DR > > Yes, I'm well aware of all of this, that's why I'm asking Pawel to > determine which test is failing so we can find out why fsck isn't > catching it. > > Scott I think the problem in Pawels case is that the filesystem itself is writing out corrupt data then later he's hitting a assertion when the filesystem is trying to read the corrupt entry. This seems to be a problem with UFS itself. As for fsck, it should fix this problem, but only after its already happened and it may take two fsck scans. I'm not sure what the current state of fsck is in fbsd, but one problem I've noticed in the past while working on fbsd is that if fsck has to create a lost+found directory it doesn't restart the current scan level. This can lead to the dirbad panic. -DR