From owner-freebsd-current@FreeBSD.ORG Tue Feb 28 23:44:03 2006 Return-Path: X-Original-To: FreeBSD-current@freebsd.org Delivered-To: FreeBSD-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 676DC16A420; Tue, 28 Feb 2006 23:44:03 +0000 (GMT) (envelope-from yds@CoolRat.org) Received: from dppl.com (sapas.dppl.net [216.182.10.231]) by mx1.FreeBSD.org (Postfix) with ESMTP id 08DEF43D45; Tue, 28 Feb 2006 23:44:02 +0000 (GMT) (envelope-from yds@CoolRat.org) Received: from [192.168.1.73] (c-69-242-5-144.hsd1.pa.comcast.net [69.242.5.144]) (AUTH: LOGIN yds) by dppl.com with esmtp; Tue, 28 Feb 2006 18:43:58 -0500 Date: Tue, 28 Feb 2006 18:43:58 -0500 From: Yarema To: FreeBSD-gnats-submit@FreeBSD.org, FreeBSD-current@FreeBSD.org Message-ID: <3BD79FAD83E2122EC1644386@ramen.coolrat.org> In-Reply-To: <20060228195343.GA85313@xor.obsecurity.org> References: <20060228195343.GA85313@xor.obsecurity.org> X-Mailer: Mulberry/4.0.3 (Mac OS X) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mailman-Approved-At: Tue, 28 Feb 2006 23:48:42 +0000 Cc: David Rhodus , Dennis Koegel , Martin Machacek , Kris Kennaway , Pawel Jakub Dawidek Subject: Re: kern/93942: panic: ufs_dirbad: bad dir X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Feb 2006 23:44:03 -0000 --On February 28, 2006 2:53:43 PM -0500 Kris Kennaway wrote: > On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote: >> >> > Number: 93942 >> > Category: kern >> > Synopsis: panic: ufs_dirbad: bad dir >> > Confidential: no >> > Severity: critical >> > Priority: high >> > Responsible: freebsd-bugs >> > State: open >> > Quarter: >> > Keywords: >> > Date-Required: >> > Class: sw-bug >> > Submitter-Id: current-users >> > Arrival-Date: Tue Feb 28 15:40:06 GMT 2006 >> > Closed-Date: >> > Last-Modified: >> > Originator: Yarema >> > Release: FreeBSD 6.1-PRERELEASE i386 >> > Organization: >> > Environment: >> System: FreeBSD 6.1-PRERELEASE #0: Mon Feb 27 04:52:11 EST 2006 i386 >> >> > Description: >> >> This is at least the third file system which got hosed for me by the >> ufs_dirbad bug on three different hard drives since 5.3 STABLE. >> I suspect this is related to the following PRs: >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=49079 >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=51001 >> >> In every case a process would lock up making the whole system >> unresponsive. A reboot, fsck -y in single user mode and another >> reboot would produce the following during the mount of the corrupt >> fs in rw mode: >> >> bad dir ino 2 at offset 16384: mangled entry >> panic: ufs_dirbad: bad dir >> cpuid = 0 >> >> Another reboot, fsck -y in single user mode and reboot produces the >> same results repeatedly. Previously I had recovered by mounting the >> corrupt fs in ro mode, backup, newfs, restore. >> >> Recently I noticed Matthew Dillon commit the following to the >> DragonFly src repository: >> >> http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html >> >> dillon 2006/02/21 10:46:56 PST >> >> DragonFly src repository >> >> Modified files: >> sys/kern vfs_cluster.c >> Log: >> bioops.io_start() was being called in a situation where the buffer >> could be brelse()'d afterwords instead of I/O being initiated. When >> this occurs, the buffer may contain softupdates-modified data which is >> never reverted, resulting in serious filesystem corruption. When >> io_start is called on a buffer, I/O MUST be initiated and terminated >> with a biodone() or the buffer's data may not be properly reverted. >> >> Solve the problem by moving the io_start() call a little further on in >> the code, after the potential brelse(). >> >> There is a possibility that this bug is responsible for the 'dirbad' >> panics often reported in DragonFly and FreeBSD circles. >> >> Revision Changes Path >> 1.16 +7 -6 src/sys/kern/vfs_cluster.c >> >> http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1=1. >> 15&r2=1.16&f=u >> >> Below is the equivalent patch to the FreeBSD RELENG_6 branch of >> src/sys/kern/vfs_cluster.c >> >> Hope this helps track down the problem. > > Does it work for you? :) > > Kris No way for me to know yet. From what I gathered, mostly from this thread: As per Matt Dillon , the corruption occurs much earlier than any consequences can be felt. The patch may prevent the corruption from occurring in the first place. But the patch does nothing for me now that I have a huge /home slice which cannot even be mounted as read-only in single user mode without triggering a page fault kernel panic in the mount process no matter how many times I run fsck -f on it. FWIW the page fault in the mount process is a different sort of kernel panic than what is described in this kern/93942 PR above. The page fault occurs while attempting to mount read-only. Attempting to mount raed-write causes the panic: ufs_dirbad: bad dir One more note, hitting the power button when the machine is locked up before the reboot and mount attempt which causes the panic produces the following output every time the button is pressed: kernel: acpi: suspend request ignored (not ready yet) Seems like there's two separate problems: 1) the root cause of the bad dir corruption. 2) fsck -f doesn't fix it no matter how many times you run it. Any pointers on how to recover my /home slice will be greatly appreciated. -- Yarema