Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 30 Sep 2007 22:50:58 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        fs@freebsd.org
Subject:   fix for ffs_fsync() always writing to the disk
Message-ID:  <20070930213402.Y1820@besplex.bde.org>

next in thread | raw e-mail | index | archive | help
Please review this fix.

Sync updates of inodes were apparently broken for about 15 years until
they were fixed accidentally on 1998/03/08 in ffs_inode.c rev.1.36.

ffs_fsync() asks for a sync update, but ffs_update() used to return
immediately if IN_MODIFIED (or a timestamp flag) is not set.  This
early return is wrong in the sync case, since quite often, a previous
call to ffs_fsync() will have cleared IN_MODIFIED and all timestamp
flags when it copied the changes to the buffer; it wrote the buffer
using a delayed write, but that write might not have completed.

Now ffs_update() never returns early in the sync case (except for ro
file systems and errors), so it always falls through to the code that
writes the buffer synchronously.  Thus fsync(2) on an ffs file system
always does a synchronous write even if the file is only open ro and
has not been written to since the file system was mounted.  (It's
interesting that fsync(1) only opens the file ro and that write
permission is not required for fsync(2)).

The early return was removed in ffs_inode.c rev.1.36, apparently just
to support complications in the soft updates case, and this old bug
got fixed accidentally for both the soft-updates case and the
non-soft-updates cases as a side effect.  In the soft updates case,
IN_MODIFIED (or a timestamp flag) doesn't indicate all modifications,
and a bwrite() is necessary for cleaning up the dependencies.  The
log message for 1.36 is null :-(.

The bug has lived for another 9 years in file systems that were cloned
from ffs before 1998.

Detection of null changes (mainly for the sync case) gives a free
optimization for the delayed-sync.  Null changes can happen quite often
because of limited resolution in timestamps.  This is only a minor
opimization because delayed sync is usually fast enough.  Optimizing
away null fsync()s is also a minor optimization because fsync() is
rarely used.

%%%
Index: ffs_inode.c
===================================================================
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_inode.c,v
retrieving revision 1.92
diff -u -0 -r1.92 ffs_inode.c
--- ffs_inode.c	7 Apr 2004 03:47:20 -0000	1.92
+++ ffs_inode.c	30 Sep 2007 12:41:03 -0000
@@ -79,0 +81,2 @@
+	struct ufs1_dinode *din1p;
+	struct ufs2_dinode *din2p;
@@ -114,7 +126,40 @@
-	if (ip->i_ump->um_fstype == UFS1)
-		*((struct ufs1_dinode *)bp->b_data +
-		    ino_to_fsbo(fs, ip->i_number)) = *ip->i_din1;
-	else
-		*((struct ufs2_dinode *)bp->b_data +
-		    ino_to_fsbo(fs, ip->i_number)) = *ip->i_din2;
-	if (waitfor && !DOINGASYNC(vp)) {
+	/*
+	 * Avoid physical writes of null changes if possible.  Determining
+	 * when this is possible is nontrivial.  When (waitfor == 0), the
+	 * simple check of IN_MODIFIED above suffices.  Otherwise, we have
+	 * to check for changes in both the inode and the buffer, since a
+	 * previous (waitfor == 0) call here may have moved all changes to
+	 * the buffer but the buffer might not have been written yet.
+	 *
+	 * Soft updates gives the additional complications that
+	 * softdep_update_inodeblock() may have caused changes to the inode
+	 * that are not reflected in IN_MODIFIED, and that a bwrite() is
+	 * unavoidable for null changes if the buffer has dependencies,
+	 * because soft updates is depending on us to call bwrite() so that
+	 * it can remove the dependencies in its callback from bdone().
+	 */
+	if (ip->i_ump->um_fstype == UFS1) {
+		din1p = (struct ufs1_dinode *)bp->b_data +
+		    ino_to_fsbo(fs, ip->i_number);
+		if (bcmp(din1p, ip->i_din1, sizeof(*din1p)) == 0) {
+			if (waitfor == 0 ||
+			    ((bp->b_flags & B_DELWRI) == 0 &&
+			    LIST_FIRST(&bp->b_dep) == NULL)) {
+				bqrelse(bp);
+				return (0);
+			}
+		} else
+			*din1p = *ip->i_din1;
+	} else {
+		din2p = (struct ufs2_dinode *)bp->b_data +
+		    ino_to_fsbo(fs, ip->i_number);
+		if (bcmp(din2p, ip->i_din2, sizeof(*din2p)) == 0) {
+			if (waitfor == 0 ||
+			    ((bp->b_flags & B_DELWRI) == 0 &&
+			    LIST_FIRST(&bp->b_dep) == NULL)) {
+				bqrelse(bp);
+				return (0);
+			}
+		} else
+			*din2p = *ip->i_din2;
+	}
%%%

This is ugly due to its large comment and complications for ffs1/2 and
soft updates.  Are the complications for soft updates complete?

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070930213402.Y1820>