Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Jun 2000 17:19:23 -0700
From:      Kirk McKusick <mckusick@mckusick.com>
To:        Poul-Henning Kamp <phk@FreeBSD.ORG>
Cc:        current@FreeBSD.ORG, dg@root.com
Subject:   Re: bioops 
Message-ID:  <200006170019.RAA07992@beastie.mckusick.com>
In-Reply-To: Your message of "Wed, 14 Jun 2000 13:30:42 PDT." <200006142030.NAA06621@implode.root.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
	To: current@FreeBSD.ORG
	Subject: HEADSUP: bioops patch.
	From: Poul-Henning Kamp <phk@FreeBSD.ORG>
	Date: Wed, 14 Jun 2000 22:29:32 +0200

	This patch virtualizes & untangles the bioops operations vector.

	Background:

	The bioops operation vector is a list of OO-like operations which can
	be performed on struct buf.  They are used by the softupdates code
	to handle dependencies.

	Ideally struct buf should have had a real OO like operations vector
	like vnodes have it, and struct bioops is the first step towards that.

	One of the reasons we should have OO-like struct buf, is that as
	long as bwrite(bp) "knows" that the buffer is backed by a disk
	device, we cannot use the UFS layer on top of a storage manager
	which isn't based on disk-devices:  When UFS modifies a directory
	inode, it will call bwrite(bp) on the buffer with the data.  This
	would not work if the backing were based on malloc(9) or anonymous
	swap-backed VM objects for instance.

	In other words: this is the main reason why we don't have a decent
	tmpfs in FreeBSD.

	Instead of just assuming that it works on a disk, bwrite(bp) should
	do a "bp->b_ops->bwrite(bp)" so that each buffer could have its own
	private idea of how to write itself, depending on what backing it has.

	So in order to move bioops closer to become a bp->b_ops, this patch
	takes two entries out of bioops:  the "sync" and the "fsync" items
	and virtualizes the rest of the elements a bit.

	The "sync" item is called only from the syncer, and is a call to the
	softupdates code to do what it needs to do for periodic syncing.

	The real way of doing that would be to have an event-handler for this
	since other code could need to be part of the sync trafic, raid5
	private parity caches could be one example.  I have not done this
	yet, since currently softupdates is the only client.

	The fsync item really doesn't belong in the fsync system call, it
	belongs in ffs_fsync, and has been moved there.

If it had been possible to put the fsync call in ffs_fsync, I would
have done that. Unfortunately, it is not possible and will hang or
panic the kernel if you put it there. The problem is that ffs_fsync
syncs out the data blocks of the associated file. The bioops call
to soft updates requests that any names associated with the file
being sync'ed be sync'ed to disk as well. That is a necessary
semantic of the system call fsync. However, it is not needed by
most clients of VOP_FSYNC. Because the sync'ing of the name requires
a walk up the filesystem tree from the inode in question to the
root of the filesystem, the locking protocol requires that the
nodes lower in the tree be unlocked before locking nodes that are
higher.  This means that the vnode being fsync'ed must be briefly
unlocked while its containing parent is locked. If the vnode being
fsync'ed is a directory, this creates a window where another process
can sneak in and make changes which leads to the panics, two entries
with the same name, etc. This window is not a problem for the fsync
call because it is not creating a new name, but it is a problem if
VOP_FSYNC is called in the open, link, mkdir, etc paths (as it will
be in for example if a block allocation fails due to the filesystem
being full. Thus there are two choices: put the code back as it was
or chance the VOP_FSYNC call interface to add a flags value that
indicates whether the name needs to be syned out as well as the
data. I chose the former as I did not want to disrupt a widely
used interface.

	To give the right behaviour when SOFTUPDATES is not compiled in,
	stubs for both of these functions have been added to ffs_softdep_stub.c

	Finally all the checks to see if the bioops vector is populated
	has been centralized in in-line functions in <sys/buf.h> thereby
	paving the road for the global bioops to become bp->b_ops.

	Comments, reviews, tests please

	Poul-Henning

Index: contrib/softupdates/ffs_softdep.c
===================================================================
RCS file: /home/ncvs/src/sys/contrib/softupdates/ffs_softdep.c,v
retrieving revision 1.64
diff -u -r1.64 ffs_softdep.c
--- contrib/softupdates/ffs_softdep.c	2000/05/26 02:01:59	1.64
+++ contrib/softupdates/ffs_softdep.c	2000/06/14 19:26:46
@@ -222,8 +222,6 @@
	softdep_disk_io_initiation,		/* io_start */
	softdep_disk_write_complete,		/* io_complete */
	softdep_deallocate_dependencies,	/* io_deallocate */
-	softdep_fsync,				/* io_fsync */
-	softdep_process_worklist,		/* io_sync */
	softdep_move_dependencies,		/* io_movedeps */
	softdep_count_dependencies,		/* io_countdeps */
 };
Index: kern/vfs_bio.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
retrieving revision 1.258
diff -u -r1.258 vfs_bio.c
--- kern/vfs_bio.c	2000/05/26 02:04:39	1.258
+++ kern/vfs_bio.c	2000/06/14 19:00:56
@@ -616,8 +616,8 @@
		newbp->b_flags &= ~B_INVAL;
 
		/* move over the dependencies */
-		if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_movedeps)
-			(*bioops.io_movedeps)(bp, newbp);
+		if (LIST_FIRST(&bp->b_dep) != NULL)
+			buf_movedeps(bp, newbp);
 
		/*
		 * Initiate write on the copy, release the original to
@@ -673,10 +673,10 @@
	/*
	 * Process dependencies then return any unfinished ones.
	 */
-	if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_complete)
-		(*bioops.io_complete)(bp);
-	if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_movedeps)
-		(*bioops.io_movedeps)(bp, origbp);
+	if (LIST_FIRST(&bp->b_dep) != NULL)
+		buf_complete(bp);
+	if (LIST_FIRST(&bp->b_dep) != NULL)
+		buf_movedeps(bp, origbp);
	/*
	 * Clear the BX_BKGRDINPROG flag in the original buffer
	 * and awaken it if it is waiting for the write to complete.
@@ -939,8 +939,8 @@
		 * cache the buffer.
		 */
		bp->b_flags |= B_INVAL;
-		if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_deallocate)
-			(*bioops.io_deallocate)(bp);
+		if (LIST_FIRST(&bp->b_dep) != NULL)
+			buf_deallocate(bp);
		if (bp->b_flags & B_DELWRI) {
			--numdirtybuffers;
			numdirtywakeup();
@@ -1570,8 +1570,8 @@
			crfree(bp->b_wcred);
			bp->b_wcred = NOCRED;
		}
-		if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_deallocate)
-			(*bioops.io_deallocate)(bp);
+		if (LIST_FIRST(&bp->b_dep) != NULL)
+			buf_deallocate(bp);
		if (bp->b_xflags & BX_BKGRDINPROG)
			panic("losing buffer 3");
		LIST_REMOVE(bp, b_hash);
@@ -1848,9 +1848,8 @@
				break;
			}
			if (LIST_FIRST(&bp->b_dep) != NULL &&
-			    bioops.io_countdeps &&
			    (bp->b_flags & B_DEFERRED) == 0 &&
-			    (*bioops.io_countdeps)(bp, 0)) {
+			    buf_countdeps(bp, 0)) {
				TAILQ_REMOVE(&bufqueues[QUEUE_DIRTY],
				    bp, b_freelist);
				TAILQ_INSERT_TAIL(&bufqueues[QUEUE_DIRTY],
@@ -2664,8 +2663,8 @@
		splx(s);
		return;
	}
-	if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_complete)
-		(*bioops.io_complete)(bp);
+	if (LIST_FIRST(&bp->b_dep) != NULL)
+		buf_complete(bp);
 
	if (bp->b_flags & B_VMIO) {
		int i, resid;
Index: kern/vfs_cluster.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v
retrieving revision 1.98
diff -u -r1.98 vfs_cluster.c
--- kern/vfs_cluster.c	2000/05/05 09:58:25	1.98
+++ kern/vfs_cluster.c	2000/06/14 19:01:13
@@ -805,9 +805,8 @@
				splx(s);
			} /* end of code for non-first buffers only */
			/* check for latent dependencies to be handled */
-			if ((LIST_FIRST(&tbp->b_dep)) != NULL &&
-			    bioops.io_start)
-				(*bioops.io_start)(tbp);
+			if ((LIST_FIRST(&tbp->b_dep)) != NULL)
+				buf_start(tbp);
			/*
			 * If the IO is via the VM then we do some
			 * special VM hackery. (yuck)
Index: kern/vfs_subr.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.257
diff -u -r1.257 vfs_subr.c
--- kern/vfs_subr.c	2000/05/26 02:04:39	1.257
+++ kern/vfs_subr.c	2000/06/14 19:56:33
@@ -1029,8 +1029,7 @@
		/*
		 * Do soft update processing.
		 */
-		if (bioops.io_sync)
-			(*bioops.io_sync)(NULL);
+		softdep_process_worklist(NULL);
 
		/*
		 * The variable rushjob allows the kernel to speed up the
Index: kern/vfs_syscalls.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.153
diff -u -r1.153 vfs_syscalls.c
--- kern/vfs_syscalls.c	2000/05/05 09:58:27	1.153
+++ kern/vfs_syscalls.c	2000/06/14 19:38:24
@@ -2545,10 +2545,7 @@
	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, p);
	if (vp->v_object)
		vm_object_page_clean(vp->v_object, 0, 0, 0);
-	if ((error = VOP_FSYNC(vp, fp->f_cred, MNT_WAIT, p)) == 0 &&
-	    vp->v_mount && (vp->v_mount->mnt_flag & MNT_SOFTDEP) &&
-	    bioops.io_fsync)
-		error = (*bioops.io_fsync)(vp);
+	error = VOP_FSYNC(vp, fp->f_cred, MNT_WAIT, p);
	VOP_UNLOCK(vp, 0, p);
	return (error);
 }
Index: miscfs/devfs/devfs_vnops.c
===================================================================
RCS file: /home/ncvs/src/sys/miscfs/devfs/devfs_vnops.c,v
retrieving revision 1.96
diff -u -r1.96 devfs_vnops.c
--- miscfs/devfs/devfs_vnops.c	2000/05/05 09:58:29	1.96
+++ miscfs/devfs/devfs_vnops.c	2000/06/14 19:02:35
@@ -1570,9 +1570,8 @@
		return error;
 
 
-	if ((bp->b_iocmd == BIO_WRITE) &&
-		(LIST_FIRST(&bp->b_dep)) != NULL && bioops.io_start)
-		(*bioops.io_start)(bp);
+	if ((bp->b_iocmd == BIO_WRITE) && (LIST_FIRST(&bp->b_dep)) != NULL)
+		buf_start(bp);
	switch (vp->v_type) {
	case VCHR:
		(*vp->v_rdev->si_devsw->d_strategy)(&bp->b_io);
Index: miscfs/specfs/spec_vnops.c
===================================================================
RCS file: /home/ncvs/src/sys/miscfs/specfs/spec_vnops.c,v
retrieving revision 1.138
diff -u -r1.138 spec_vnops.c
--- miscfs/specfs/spec_vnops.c	2000/06/12 10:20:18	1.138
+++ miscfs/specfs/spec_vnops.c	2000/06/14 19:02:49
@@ -417,9 +417,8 @@
	struct mount *mp;
 
	bp = ap->a_bp;
-	if ((bp->b_iocmd == BIO_WRITE) &&
-		(LIST_FIRST(&bp->b_dep)) != NULL && bioops.io_start)
-		(*bioops.io_start)(bp);
+	if ((bp->b_iocmd == BIO_WRITE) && (LIST_FIRST(&bp->b_dep)) != NULL)
+		buf_start(bp);
 
	/*
	 * Collect statistics on synchronous and asynchronous read
Index: sys/buf.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/buf.h,v
retrieving revision 1.103
diff -u -r1.103 buf.h
--- sys/buf.h	2000/05/26 02:06:53	1.103
+++ sys/buf.h	2000/06/14 19:26:59
@@ -64,8 +64,6 @@
	void	(*io_start) __P((struct buf *));
	void	(*io_complete) __P((struct buf *));
	void	(*io_deallocate) __P((struct buf *));
-	int	(*io_fsync) __P((struct vnode *));
-	int	(*io_sync) __P((struct mount *));
	void	(*io_movedeps) __P((struct buf *, struct buf *));
	int	(*io_countdeps) __P((struct buf *, int));
 } bioops;
@@ -405,6 +403,43 @@
 
 #define BUF_WRITE(bp)		VOP_BWRITE((bp)->b_vp, (bp))
 #define BUF_STRATEGY(bp)	VOP_STRATEGY((bp)->b_vp, (bp))
+
+static __inline void
+buf_start(struct buf *bp)
+{
+	if (bioops.io_start)
+		(*bioops.io_start)(bp);
+}
+
+static __inline void
+buf_complete(struct buf *bp)
+{
+	if (bioops.io_complete)
+		(*bioops.io_complete)(bp);
+}
+
+static __inline void
+buf_deallocate(struct buf *bp)
+{
+	if (bioops.io_deallocate)
+		(*bioops.io_deallocate)(bp);
+}
+
+static __inline void
+buf_movedeps(struct buf *bp, struct buf *bp2)
+{
+	if (bioops.io_movedeps)
+		(*bioops.io_movedeps)(bp, bp2);
+}
+
+static __inline int
+buf_countdeps(struct buf *bp, int i)
+{
+	if (bioops.io_countdeps)
+		return ((*bioops.io_countdeps)(bp, i));
+	else
+		return (0);
+}
 
 #endif /* _KERNEL */
 
Index: sys/mount.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/mount.h,v
retrieving revision 1.91
diff -u -r1.91 mount.h
--- sys/mount.h	2000/05/26 02:06:55	1.91
+++ sys/mount.h	2000/06/14 19:58:22
@@ -447,6 +447,7 @@
 int	vfs_stdextattrctl __P((struct mount *mp, int cmd, const char *attrname,
		caddr_t arg, struct proc *p));
 
+void	softdep_process_worklist __P((struct mount *));
 #else /* !_KERNEL */
 
 #include <sys/cdefs.h>
Index: ufs/ffs/ffs_softdep_stub.c
===================================================================
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_softdep_stub.c,v
retrieving revision 1.8
diff -u -r1.8 ffs_softdep_stub.c
--- ufs/ffs/ffs_softdep_stub.c	2000/04/19 14:58:25	1.8
+++ ufs/ffs/ffs_softdep_stub.c	2000/06/14 19:33:11
@@ -254,4 +254,20 @@
 
	return (0);
 }
+
+int
+softdep_fsync(vp)
+	struct vnode *vp;	/* the "in_core" copy of the inode */
+{
+
+	return (0);
+}
+
+int
+softdep_process_worklist(matchmnt)
+	struct mount *matchmnt;
+{
+	return (0);
+}
+
 #endif	/* SOFTUPDATES not configured in */
Index: ufs/ffs/ffs_vnops.c
===================================================================
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vnops.c,v
retrieving revision 1.66
diff -u -r1.66 ffs_vnops.c
--- ufs/ffs/ffs_vnops.c	2000/05/05 09:59:05	1.66
+++ ufs/ffs/ffs_vnops.c	2000/06/14 19:38:13
@@ -175,7 +175,7 @@
			continue;
		if (!wait && LIST_FIRST(&bp->b_dep) != NULL &&
		    (bp->b_flags & B_DEFERRED) == 0 &&
-		    bioops.io_countdeps && (*bioops.io_countdeps)(bp, 0)) {
+		    buf_countdeps(bp, 0)) {
			bp->b_flags |= B_DEFERRED;
			continue;
		}
@@ -278,5 +278,8 @@
		}
	}
	splx(s);
-	return (UFS_UPDATE(vp, wait));
+	error = UFS_UPDATE(vp, wait);
+	if (error == 0 && vp->v_mount && (vp->v_mount->mnt_flag & MNT_SOFTDEP))
+		error = softdep_fsync(vp);
+	return (error);
 }

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD coreteam member | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message

------- End of Forwarded Message



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200006170019.RAA07992>