From owner-freebsd-current@FreeBSD.ORG  Thu Jan 22 04:08:44 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 25E6A16A4CE
	for <freebsd-current@freebsd.org>;
	Thu, 22 Jan 2004 04:08:44 -0800 (PST)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 631A643D58
	for <freebsd-current@freebsd.org>;
	Thu, 22 Jan 2004 04:08:37 -0800 (PST)	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])i0MC8Ptd024590;	Thu, 22 Jan 2004 23:08:25 +1100
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i0MC8Mp2028332;	Thu, 22 Jan 2004 23:08:23 +1100
Date: Thu, 22 Jan 2004 23:08:23 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Matthias Andree <ma@dt.e-technik.uni-dortmund.de>
In-Reply-To: <m37jzmdnhf.fsf@merlin.emma.line.org>
Message-ID: <20040122215703.E8399@gamplex.bde.org>
References: <m37jzmdnhf.fsf@merlin.emma.line.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-current@freebsd.org
Subject: Re: How to fsck -CURRENT on next reboot [ext2fs]
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Jan 2004 12:08:44 -0000

On Wed, 21 Jan 2004, Matthias Andree wrote:

> To have your FreeBSD -CURRENT fsck all ufs file systems at next reboot, do:
>
> 1. mount_ext2fs an ext2 or ext3 file system for read/write
> 2. change a file (or touch or rm one)
> 3. umount that ext2fs again
>
> 4. See how it complains about giving up on fsync first for the ext2fs, then
>    for devfs.

This hopefully only happens after mount -u to convert from ro to rw.
It happens because GEOM has enforced the open mode on the disk device
for a long time, but ext2fs is still missing the hack to always open
rw in case mount -u is used to convert from ro to rw.  This gives lots
of unwriteable buffers whose write is retried endlessly every second
or 2 but never gets as far as the driver, and strange results for
subsequent accesses to the file since the buffers are not invalid.

This was fixed in msdosfs about 6 months ago.  It was apparently more
serious there because there was no delay between the retries.

The following patch doesn't merge the comments about this from ffs because
they have some style bugs.

%%%
Index: ext2_vfsops.c
===================================================================
RCS file: /home/ncvs/src/sys/gnu/ext2fs/ext2_vfsops.c,v
retrieving revision 1.114
diff -u -2 -r1.114 ext2_vfsops.c
--- ext2_vfsops.c	5 Nov 2003 11:56:58 -0000	1.114
+++ ext2_vfsops.c	22 Jan 2004 10:39:38 -0000
@@ -656,5 +656,9 @@
 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
 	vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
+#ifdef notyet
 	error = VOP_OPEN(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, td, -1);
+#else
+	error = VOP_OPEN(devvp, FREAD|FWRITE, FSCRED, td, -1);
+#endif
 	VOP_UNLOCK(devvp, 0, td);
 	if (error)
@@ -736,5 +740,9 @@
 	if (bp)
 		brelse(bp);
+#ifdef notyet
 	(void)VOP_CLOSE(devvp, ronly ? FREAD : FREAD|FWRITE, NOCRED, td);
+#else
+	(void)VOP_CLOSE(devvp, FREAD|FWRITE, NOCRED, td);
+#endif
 	if (ump) {
 		bsd_free(ump->um_e2fs->s_es, M_EXT2MNT);
@@ -791,6 +799,10 @@

 	ump->um_devvp->v_rdev->si_mountpoint = NULL;
+#ifdef notyet
 	error = VOP_CLOSE(ump->um_devvp, ronly ? FREAD : FREAD|FWRITE,
 		NOCRED, td);
+#else
+	error = VOP_CLOSE(ump->um_devvp, FREAD|FWRITE, NOCRED, td);
+#endif
 	vrele(ump->um_devvp);
 	bsd_free(fs->s_es, M_EXT2MNT);
%%%

> 5. Now reboot.
>
> 6. syncher gives up on 3 buffers (I have / /var and /usr - coincidence?)
>
> 7. At boot, fsck complains about non-cleanly umounted file systems

Syncing on reboot was broken in rev.1.14 of ext2fs/fs.h and associated
changes.  This undoes the quick fix for the previous implementation of
the bug:
bug: rev.1.1: never release the buffers before ext2_umount()
fix: rev.1.3: half release the buffers using B_LOCKED
     rev.1.4: release the buffers some more by not setting B_DELWRI on them
     rev.1.6: be more careful about setting B_DELWRI
bug: rev.1.14: half release the buffers using BUF_KERNPROC() instead of
     B_LOCKED.  This has the disadvantage of not actually working.

> For added fun, try instead of 5., try:
>
> 5B. umount -f on the ext2fs that you modified instead of reboot
> 6B. See the kernel panic
> 7B. savecore logs "reboot after panic: vinvalbuf: dirty bufs"

I didn't test this.  Do you get the "fsync: giving up on dirty" message
in this case?  I don't see how vinvalbuf can panic exactly.  However,
it seems to have a race:

% 		while (vp->v_numoutput) {
% 			vp->v_iflag |= VI_BWAIT;
% 			error = msleep(&vp->v_numoutput, VI_MTX(vp),
% 			    slpflag | (PRIBIO + 1), "vinvlbuf", slptimeo);
% 			if (error) {
% 				VI_UNLOCK(vp);
% 				return (error);
% 			}
% 		}

This seems OK, except it may fail, and both ffs and ext2fs panic in reload
if it fails there.

% 		if (!TAILQ_EMPTY(&vp->v_dirtyblkhd)) {
% 			VI_UNLOCK(vp);
% 			if ((error = VOP_FSYNC(vp, cred, MNT_WAIT, td)) != 0)
% 				return (error);

This is very likely to fail, since we have bogusly unwriteable blocks, but
if it fails that VOP_FSYNC() (= vop_stdfsync() + an update for ext2fs) fails
and we don't panic here at least.

% 			/*
% 			 * XXX We could save a lock/unlock if this was only
% 			 * enabled under INVARIANTS
% 			 */
% 			VI_LOCK(vp);

There seems to be a window in which new i/o can be started.  This may be
more likely because we have lots of bogusly unwriteable blocks to retry.

% 			if (vp->v_numoutput > 0 ||
% 			    !TAILQ_EMPTY(&vp->v_dirtyblkhd))
% 				panic("vinvalbuf: dirty bufs");

This panic somehow occurred.

% 		}

There is still the fundamental problem that there is no way to discard
unwriteable blocks (either bogus ones or ones with actual i/o errors).
Even vinvalbuf() doesn't really understand them.  In the umount -f case,
it should do something like direct VOP_FSYNC() to only retry a limited
number of times, then forcibly discard them if they are still unwriteable.

> e2fsprogs (with mke2fs, the ext2fs newfs tool) are available as port
> from http://mandree.home.pages.de/freebsd/e2fsprogs/ while Tytso
> (e2fsprogs maintainer) reviews my changes - feedback on my port is
> appreciated. The e2fsprogs stuff from official ports doesn't work
> because it assumes that block devices are buffered.

About time this was fixed :-).

Bruce