From owner-freebsd-current@FreeBSD.ORG Wed Mar 1 20:10:43 2006 Return-Path: X-Original-To: FreeBSD-current@freebsd.org Delivered-To: FreeBSD-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6355E16A420 for ; Wed, 1 Mar 2006 20:10:43 +0000 (GMT) (envelope-from sdrhodus@gmail.com) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C16143D45 for ; Wed, 1 Mar 2006 20:10:40 +0000 (GMT) (envelope-from sdrhodus@gmail.com) Received: by wproxy.gmail.com with SMTP id i23so230421wra for ; Wed, 01 Mar 2006 12:10:39 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=uP9lVzOqVKBFSme1hINBNHsM3ZRN+CwRL0/yw4vgn8yLZjEtxZ2XeqeVEqwtE95ksG9f0kt++rD51HyWGUDpvvPDHSrbwXvuosXQKiJcamaMkD2Pl9qgDdsVpME7fV3V1+jqKbRm3Xp8DQBr5+v/jtIhRacOTx3/s1on09JAw2c= Received: by 10.65.43.11 with SMTP id v11mr297508qbj; Wed, 01 Mar 2006 12:10:38 -0800 (PST) Received: by 10.64.178.5 with HTTP; Wed, 1 Mar 2006 12:10:38 -0800 (PST) Message-ID: Date: Wed, 1 Mar 2006 15:10:38 -0500 From: "David Rhodus" Sender: sdrhodus@gmail.com To: Yarema In-Reply-To: <3BD79FAD83E2122EC1644386@ramen.coolrat.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <20060228195343.GA85313@xor.obsecurity.org> <3BD79FAD83E2122EC1644386@ramen.coolrat.org> X-Mailman-Approved-At: Wed, 01 Mar 2006 22:48:26 +0000 Cc: Dennis Koegel , FreeBSD-current@freebsd.org, Martin Machacek , Kris Kennaway , Pawel Jakub Dawidek , FreeBSD-gnats-submit@freebsd.org Subject: Re: kern/93942: panic: ufs_dirbad: bad dir X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Mar 2006 20:10:43 -0000 On 2/28/06, Yarema wrote: > > > --On February 28, 2006 2:53:43 PM -0500 Kris Kennaway > wrote: > > > On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote: > >> > >> > Number: 93942 > >> > Category: kern > >> > Synopsis: panic: ufs_dirbad: bad dir > >> > Confidential: no > >> > Severity: critical > >> > Priority: high > >> > Responsible: freebsd-bugs > >> > State: open > >> > Quarter: > >> > Keywords: > >> > Date-Required: > >> > Class: sw-bug > >> > Submitter-Id: current-users > >> > Arrival-Date: Tue Feb 28 15:40:06 GMT 2006 > >> > Closed-Date: > >> > Last-Modified: > >> > Originator: Yarema > >> > Release: FreeBSD 6.1-PRERELEASE i386 > >> > Organization: > >> > Environment: > >> System: FreeBSD 6.1-PRERELEASE #0: Mon Feb 27 04:52:11 EST 2006 i386 > >> > >> > Description: > >> > >> This is at least the third file system which got hosed for me by the > >> ufs_dirbad bug on three different hard drives since 5.3 STABLE. > >> I suspect this is related to the following PRs: > >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D49079 > >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3D51001 > >> > >> In every case a process would lock up making the whole system > >> unresponsive. A reboot, fsck -y in single user mode and another > >> reboot would produce the following during the mount of the corrupt > >> fs in rw mode: > >> > >> bad dir ino 2 at offset 16384: mangled entry > >> panic: ufs_dirbad: bad dir > >> cpuid =3D 0 > >> > >> Another reboot, fsck -y in single user mode and reboot produces the > >> same results repeatedly. Previously I had recovered by mounting the > >> corrupt fs in ro mode, backup, newfs, restore. > >> > >> Recently I noticed Matthew Dillon commit the following to the > >> DragonFly src repository: > >> > >> http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html > >> > >> dillon 2006/02/21 10:46:56 PST > >> > >> DragonFly src repository > >> > >> Modified files: > >> sys/kern vfs_cluster.c > >> Log: > >> bioops.io_start() was being called in a situation where the buffer > >> could be brelse()'d afterwords instead of I/O being initiated. When > >> this occurs, the buffer may contain softupdates-modified data which = is > >> never reverted, resulting in serious filesystem corruption. When > >> io_start is called on a buffer, I/O MUST be initiated and terminated > >> with a biodone() or the buffer's data may not be properly reverted. > >> > >> Solve the problem by moving the io_start() call a little further on = in > >> the code, after the potential brelse(). > >> > >> There is a possibility that this bug is responsible for the 'dirbad' > >> panics often reported in DragonFly and FreeBSD circles. > >> > >> Revision Changes Path > >> 1.16 +7 -6 src/sys/kern/vfs_cluster.c > >> > >> http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1= =3D1. > >> 15&r2=3D1.16&f=3Du > >> > >> Below is the equivalent patch to the FreeBSD RELENG_6 branch of > >> src/sys/kern/vfs_cluster.c > >> > >> Hope this helps track down the problem. > > > > Does it work for you? :) > > > > Kris > > No way for me to know yet. From what I gathered, mostly from this thread= : > > > As per Matt Dillon > , > the corruption occurs much earlier than any consequences can be felt. > The patch may prevent the corruption from occurring in the first place. > But the patch does nothing for me now that I have a huge /home slice > which cannot even be mounted as read-only in single user mode without > triggering a page fault kernel panic in the mount process no matter > how many times I run fsck -f on it. > > FWIW the page fault in the mount process is a different sort of kernel > panic than what is described in this kern/93942 PR above. The page fault > occurs while attempting to mount read-only. Attempting to mount raed-wri= te > causes the panic: ufs_dirbad: bad dir > > One more note, hitting the power button when the machine is locked up > before the reboot and mount attempt which causes the panic produces the > following output every time the button is pressed: > > kernel: acpi: suspend request ignored (not ready yet) > > Seems like there's two separate problems: > 1) the root cause of the bad dir corruption. > 2) fsck -f doesn't fix it no matter how many times you run it. > > Any pointers on how to recover my /home slice will be greatly appreciated= . > > -- > Yarema I have been working with the bad dir problem for several months and I have not had corruption which fsck would not correct. -DR