From owner-freebsd-stable Sun Jan 27 3:36: 9 2002 Delivered-To: freebsd-stable@freebsd.org Received: from moreton.com.au (pacific.moreton.com.au [203.143.238.4]) by hub.freebsd.org (Postfix) with ESMTP id B11AD37B400 for ; Sun, 27 Jan 2002 03:36:04 -0800 (PST) Received: from pdh by bofh.internal.moreton.com.au with local (Exim 3.33 #1 (Debian)) id 16Uj1f-0000DR-00 for ; Sun, 27 Jan 2002 16:42:51 +1000 Date: Sun, 27 Jan 2002 16:42:50 +1000 From: Phil Homewood To: stable@freebsd.org Subject: FS corruption w/softupdates on 4.5-RC ? Message-ID: <20020127064250.GA333@moreton.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.25i X-PGP-Key-ID: 1024/EDE1CCB5 1996/02/26 X-PGP-Fingerprint: 86 B5 37 9D 5B ED EC BB 7C 0D B5 D6 C2 45 13 F1 X-PGP-Public-Key-Finger: phil@rivendell.apana.org.au X-PGP-Public-Key-URL: http://rivendell.apana.org.au/~phil/pgp.asc Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG A system recently upgraded from a 6 month old 4.3-STABLE image appears to be suddenly experiencing massive FS corruption (or fsck is very confused when checking a readonly-mounted FS.) I've been following what seem to be a lot of dead-ends on this, but seem to have tracked it down to the following specifics: * Only softupdate filesystems appear to have the problem. I first thought I saw it on a non-softupdate FS, but may have been mistaken. * 4.5-RC kernel (as of Jan 26, also reupdated today) breaks, 4.3-STABLE does not. (Tried 4.4-REL, no breakage, but the kernel I had didn't have softupdates, so inconclusive.) * GENERIC kernel is sufficient to reproduce * Unmounting the FS and fscking it doesn't *seem* to show the problem up. fscking the filesystem after remounting readonly does cause breakage. Unmounting, remounting readonly, and fscking seems safe. * dd'ing a mounted fs to another identical device (I've been backing up the root fs to a twin partition like this forever) and then fscking the backup device exhibits the breakage. (Maybe I should change that to dump|restore, like I thought I'd been doing all along :-) * The bug seems to be most easily tickled using MAKEDEV. I can reproduce the problem reliably by doing: # newfs /dev/da0s2g # tunefs -n enable /dev/da0s2g # mount /dev/da0s2g /tmp # cd /tmp # mkdir dev # cd dev # cp /dev/MAKEDEV . # sh MAKEDEV all # cd / # mount -u -r /tmp # fsck /tmp * Occasionally the fsck or (if fsck comes up clean) subsequent mount will panic: so far I've seen panic: handle_workitem_remove: bad file delta softdep_deallocate_dependencies: dangling deps (that one was interspersed with fsck output, so I could have got it wrong) and one that may have involved a dup alloc; I unfortunately didn't copy it down. Attached is a copy of my dmesg and kernel config. Any clues greatfully appreciated... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message