From owner-freebsd-bugs@FreeBSD.ORG Wed Mar 1 20:20:14 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 45F8816A427 for ; Wed, 1 Mar 2006 20:20:14 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 059AC43D45 for ; Wed, 1 Mar 2006 20:20:14 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k21KKDgl082700 for ; Wed, 1 Mar 2006 20:20:13 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k21KKDFn082699; Wed, 1 Mar 2006 20:20:13 GMT (envelope-from gnats) Date: Wed, 1 Mar 2006 20:20:13 GMT Message-Id: <200603012020.k21KKDFn082699@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: "David Rhodus" Cc: Subject: Re: kern/93942: panic: ufs_dirbad: bad dir X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: David Rhodus List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Mar 2006 20:20:14 -0000 The following reply was made to PR kern/93942; it has been noted by GNATS. From: "David Rhodus" To: Yarema Cc: FreeBSD-gnats-submit@freebsd.org, FreeBSD-current@freebsd.org, "Kris Kennaway" , "Dennis Koegel" , "Doug White" , "Martin Machacek" , "David O'Brien" , "Scott Long" , "Pawel Jakub Dawidek" Subject: Re: kern/93942: panic: ufs_dirbad: bad dir Date: Wed, 1 Mar 2006 15:10:38 -0500 On 2/28/06, Yarema wrote: > > > --On February 28, 2006 2:53:43 PM -0500 Kris Kennaway > wrote: > > > On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote: > >> > >> > Number: 93942 > >> > Category: kern > >> > Synopsis: panic: ufs_dirbad: bad dir > >> > Confidential: no > >> > Severity: critical > >> > Priority: high > >> > Responsible: freebsd-bugs > >> > State: open > >> > Quarter: > >> > Keywords: > >> > Date-Required: > >> > Class: sw-bug > >> > Submitter-Id: current-users > >> > Arrival-Date: Tue Feb 28 15:40:06 GMT 2006 > >> > Closed-Date: > >> > Last-Modified: > >> > Originator: Yarema > >> > Release: FreeBSD 6.1-PRERELEASE i386 > >> > Organization: > >> > Environment: > >> System: FreeBSD 6.1-PRERELEASE #0: Mon Feb 27 04:52:11 EST 2006 i386 > >> > >> > Description: > >> > >> This is at least the third file system which got hosed for me by the > >> ufs_dirbad bug on three different hard drives since 5.3 STABLE. > >> I suspect this is related to the following PRs: > >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D49079 > >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D51001 > >> > >> In every case a process would lock up making the whole system > >> unresponsive. A reboot, fsck -y in single user mode and another > >> reboot would produce the following during the mount of the corrupt > >> fs in rw mode: > >> > >> bad dir ino 2 at offset 16384: mangled entry > >> panic: ufs_dirbad: bad dir > >> cpuid =3D 0 > >> > >> Another reboot, fsck -y in single user mode and reboot produces the > >> same results repeatedly. Previously I had recovered by mounting the > >> corrupt fs in ro mode, backup, newfs, restore. > >> > >> Recently I noticed Matthew Dillon commit the following to the > >> DragonFly src repository: > >> > >> http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html > >> > >> dillon 2006/02/21 10:46:56 PST > >> > >> DragonFly src repository > >> > >> Modified files: > >> sys/kern vfs_cluster.c > >> Log: > >> bioops.io_start() was being called in a situation where the buffer > >> could be brelse()'d afterwords instead of I/O being initiated. When > >> this occurs, the buffer may contain softupdates-modified data which = is > >> never reverted, resulting in serious filesystem corruption. When > >> io_start is called on a buffer, I/O MUST be initiated and terminated > >> with a biodone() or the buffer's data may not be properly reverted. > >> > >> Solve the problem by moving the io_start() call a little further on = in > >> the code, after the potential brelse(). > >> > >> There is a possibility that this bug is responsible for the 'dirbad' > >> panics often reported in DragonFly and FreeBSD circles. > >> > >> Revision Changes Path > >> 1.16 +7 -6 src/sys/kern/vfs_cluster.c > >> > >> http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1= =3D1. > >> 15&r2=3D1.16&f=3Du > >> > >> Below is the equivalent patch to the FreeBSD RELENG_6 branch of > >> src/sys/kern/vfs_cluster.c > >> > >> Hope this helps track down the problem. > > > > Does it work for you? :) > > > > Kris > > No way for me to know yet. From what I gathered, mostly from this thread= : > > > As per Matt Dillon > , > the corruption occurs much earlier than any consequences can be felt. > The patch may prevent the corruption from occurring in the first place. > But the patch does nothing for me now that I have a huge /home slice > which cannot even be mounted as read-only in single user mode without > triggering a page fault kernel panic in the mount process no matter > how many times I run fsck -f on it. > > FWIW the page fault in the mount process is a different sort of kernel > panic than what is described in this kern/93942 PR above. The page fault > occurs while attempting to mount read-only. Attempting to mount raed-wri= te > causes the panic: ufs_dirbad: bad dir > > One more note, hitting the power button when the machine is locked up > before the reboot and mount attempt which causes the panic produces the > following output every time the button is pressed: > > kernel: acpi: suspend request ignored (not ready yet) > > Seems like there's two separate problems: > 1) the root cause of the bad dir corruption. > 2) fsck -f doesn't fix it no matter how many times you run it. > > Any pointers on how to recover my /home slice will be greatly appreciated= . > > -- > Yarema I have been working with the bad dir problem for several months and I have not had corruption which fsck would not correct. -DR