Date: Sun, 30 Sep 2007 22:50:58 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: fs@freebsd.org Subject: fix for ffs_fsync() always writing to the disk Message-ID: <20070930213402.Y1820@besplex.bde.org>
next in thread | raw e-mail | index | archive | help
Please review this fix. Sync updates of inodes were apparently broken for about 15 years until they were fixed accidentally on 1998/03/08 in ffs_inode.c rev.1.36. ffs_fsync() asks for a sync update, but ffs_update() used to return immediately if IN_MODIFIED (or a timestamp flag) is not set. This early return is wrong in the sync case, since quite often, a previous call to ffs_fsync() will have cleared IN_MODIFIED and all timestamp flags when it copied the changes to the buffer; it wrote the buffer using a delayed write, but that write might not have completed. Now ffs_update() never returns early in the sync case (except for ro file systems and errors), so it always falls through to the code that writes the buffer synchronously. Thus fsync(2) on an ffs file system always does a synchronous write even if the file is only open ro and has not been written to since the file system was mounted. (It's interesting that fsync(1) only opens the file ro and that write permission is not required for fsync(2)). The early return was removed in ffs_inode.c rev.1.36, apparently just to support complications in the soft updates case, and this old bug got fixed accidentally for both the soft-updates case and the non-soft-updates cases as a side effect. In the soft updates case, IN_MODIFIED (or a timestamp flag) doesn't indicate all modifications, and a bwrite() is necessary for cleaning up the dependencies. The log message for 1.36 is null :-(. The bug has lived for another 9 years in file systems that were cloned from ffs before 1998. Detection of null changes (mainly for the sync case) gives a free optimization for the delayed-sync. Null changes can happen quite often because of limited resolution in timestamps. This is only a minor opimization because delayed sync is usually fast enough. Optimizing away null fsync()s is also a minor optimization because fsync() is rarely used. %%% Index: ffs_inode.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_inode.c,v retrieving revision 1.92 diff -u -0 -r1.92 ffs_inode.c --- ffs_inode.c 7 Apr 2004 03:47:20 -0000 1.92 +++ ffs_inode.c 30 Sep 2007 12:41:03 -0000 @@ -79,0 +81,2 @@ + struct ufs1_dinode *din1p; + struct ufs2_dinode *din2p; @@ -114,7 +126,40 @@ - if (ip->i_ump->um_fstype == UFS1) - *((struct ufs1_dinode *)bp->b_data + - ino_to_fsbo(fs, ip->i_number)) = *ip->i_din1; - else - *((struct ufs2_dinode *)bp->b_data + - ino_to_fsbo(fs, ip->i_number)) = *ip->i_din2; - if (waitfor && !DOINGASYNC(vp)) { + /* + * Avoid physical writes of null changes if possible. Determining + * when this is possible is nontrivial. When (waitfor == 0), the + * simple check of IN_MODIFIED above suffices. Otherwise, we have + * to check for changes in both the inode and the buffer, since a + * previous (waitfor == 0) call here may have moved all changes to + * the buffer but the buffer might not have been written yet. + * + * Soft updates gives the additional complications that + * softdep_update_inodeblock() may have caused changes to the inode + * that are not reflected in IN_MODIFIED, and that a bwrite() is + * unavoidable for null changes if the buffer has dependencies, + * because soft updates is depending on us to call bwrite() so that + * it can remove the dependencies in its callback from bdone(). + */ + if (ip->i_ump->um_fstype == UFS1) { + din1p = (struct ufs1_dinode *)bp->b_data + + ino_to_fsbo(fs, ip->i_number); + if (bcmp(din1p, ip->i_din1, sizeof(*din1p)) == 0) { + if (waitfor == 0 || + ((bp->b_flags & B_DELWRI) == 0 && + LIST_FIRST(&bp->b_dep) == NULL)) { + bqrelse(bp); + return (0); + } + } else + *din1p = *ip->i_din1; + } else { + din2p = (struct ufs2_dinode *)bp->b_data + + ino_to_fsbo(fs, ip->i_number); + if (bcmp(din2p, ip->i_din2, sizeof(*din2p)) == 0) { + if (waitfor == 0 || + ((bp->b_flags & B_DELWRI) == 0 && + LIST_FIRST(&bp->b_dep) == NULL)) { + bqrelse(bp); + return (0); + } + } else + *din2p = *ip->i_din2; + } %%% This is ugly due to its large comment and complications for ffs1/2 and soft updates. Are the complications for soft updates complete? Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070930213402.Y1820>