From owner-freebsd-current@FreeBSD.ORG Thu Dec 2 01:08:54 2010 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 320A01065670; Thu, 2 Dec 2010 01:08:54 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id D120E8FC0A; Thu, 2 Dec 2010 01:08:53 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id oB20Ubno068176; Wed, 1 Dec 2010 16:30:37 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201012020030.oB20Ubno068176@chez.mckusick.com> To: Garrett Cooper In-reply-to: <20101201142748.GN2392@deviant.kiev.zoral.com.ua> Date: Wed, 01 Dec 2010 16:30:37 -0800 From: Kirk McKusick X-Mailman-Approved-At: Thu, 02 Dec 2010 05:20:21 +0000 Cc: Kostik Belousov , Peter Holm , current@freebsd.org, Jeff Roberson Subject: Re: How a full fsck screwed up my SU+J filesystem X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Dec 2010 01:08:54 -0000 > Date: Wed, 1 Dec 2010 16:27:48 +0200 > From: Kostik Belousov > To: Peter Holm > Cc: Garrett Cooper , > Marshall Kirk McKusick , current@freebsd.org > Subject: Re: How a full fsck screwed up my SU+J filesystem > > On Wed, Dec 01, 2010 at 12:00:08PM +0100, Peter Holm wrote: > > On Wed, Dec 01, 2010 at 01:28:06AM -0800, Garrett Cooper wrote: > > > > > > So... I was doing a portmaster -af today because vlc stopped playing > > > audio (for some reason ... I kind of went on a pkg_cutleaves rampage > > > and probably deinstalled too much stuff), and the machine hardlocked > > > during an upgrade. I did a soft reboot and saw messages along the > > > lines of "your journal and filesystem mount time mismatched; running > > > a full fsck". I figured "ok, sure..." and let it do it's thing. > > > Problem was that it pruned a lot of stuff from my /usr partition -- > > > including the .sujournal !!! So now it's stuck at Mounting local > > > file systems: stating: > > > > > > Failed to find journal. Use tunefs to create one > > > Failed to start journal: 2 > > > > > > (I assume the 2 means ENOENT). All of the above were printf(9)'s > > > from the kernel. > > > > > > Now the machine won't continue in multiuser mode (doesn't respond > > > to interrupts, no panic, etc). Going into ddb, I don't see anything > > > in info_threads (just a bunch of references to sched_switch, a few > > > to fork_trampoline, cpustop_handler, and kdb_enter). I'm going to > > > try and massage the machine back to life from single user mode, but > > > the fact that this died in this way (i.e. .sujournal getting nuked > > > by a full fsck) is a bit disheartening for SU+J :(... It would be > > > nice if at least the fsck aborted before going and nuking the > > > journal :/... (or at the very least if the file wasn't removable -- > > > i.e. SF_NOUNLINK). > > > > > > Here's to hoping I can resuscitate the filesystem... > > > > > > Thanks, > > > -Garrett > > > > Thank you for reporting this. > > > > I was able to reproduce the problem by: > > > > tunefs -j enable /dev/md5a > > mount /dev/md5a /mnt > > chflags 0 /mnt/.sujournal > > rm -f /mnt/.sujournal > > umount /mnt > > mount /dev/md5a /mnt > > > > The mount(1) is now stuck in mntref. > > > > http://people.freebsd.org/~pho/stress/log/kostik404.txt > > > > A sequence of "tunefs -j disable" + "tunefs -j enable" should get > > you going. > > The action is of the category "do not do it then" for sure. > > The problem in kostik404 is due to ffs_mount() did not cleaned up > the vnodes instantiated during the mount. Activating softdep journal > instantiates at least root vnode, and a journal vnode, if found. The > following patch fixed it for me. > > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > index 94951e4..72f40da 100644 > --- a/sys/ufs/ffs/ffs_vfsops.c > +++ b/sys/ufs/ffs/ffs_vfsops.c > @@ -928,6 +928,7 @@ ffs_mountfs(devvp, mp, td) > if ((fs->fs_flags & FS_DOSOFTDEP) && > (error =3D softdep_mount(devvp, mp, fs, cred)) !=3D 0) { > free(fs->fs_csp, M_UFSMNT); > + ffs_flushfiles(mp, FORCECLOSE, td); > goto out; > } > if (fs->fs_snapinum[0] !=3D 0) > Thanks all: Garrett for the report, Peter for the way to reproduce the problem, and Kostik for a fix. I have copied Jeff so that he can confirm that Kostik's fix is the appropriate thing to do. And I will take a look at fsck to see if I can make it a bit more paranoid about removing .sujournal. Kirk McKusick