Date: Fri, 16 Jun 2000 17:19:23 -0700 From: Kirk McKusick <mckusick@mckusick.com> To: Poul-Henning Kamp <phk@FreeBSD.ORG> Cc: current@FreeBSD.ORG, dg@root.com Subject: Re: bioops Message-ID: <200006170019.RAA07992@beastie.mckusick.com> In-Reply-To: Your message of "Wed, 14 Jun 2000 13:30:42 PDT." <200006142030.NAA06621@implode.root.com>
next in thread | previous in thread | raw e-mail | index | archive | help
To: current@FreeBSD.ORG Subject: HEADSUP: bioops patch. From: Poul-Henning Kamp <phk@FreeBSD.ORG> Date: Wed, 14 Jun 2000 22:29:32 +0200 This patch virtualizes & untangles the bioops operations vector. Background: The bioops operation vector is a list of OO-like operations which can be performed on struct buf. They are used by the softupdates code to handle dependencies. Ideally struct buf should have had a real OO like operations vector like vnodes have it, and struct bioops is the first step towards that. One of the reasons we should have OO-like struct buf, is that as long as bwrite(bp) "knows" that the buffer is backed by a disk device, we cannot use the UFS layer on top of a storage manager which isn't based on disk-devices: When UFS modifies a directory inode, it will call bwrite(bp) on the buffer with the data. This would not work if the backing were based on malloc(9) or anonymous swap-backed VM objects for instance. In other words: this is the main reason why we don't have a decent tmpfs in FreeBSD. Instead of just assuming that it works on a disk, bwrite(bp) should do a "bp->b_ops->bwrite(bp)" so that each buffer could have its own private idea of how to write itself, depending on what backing it has. So in order to move bioops closer to become a bp->b_ops, this patch takes two entries out of bioops: the "sync" and the "fsync" items and virtualizes the rest of the elements a bit. The "sync" item is called only from the syncer, and is a call to the softupdates code to do what it needs to do for periodic syncing. The real way of doing that would be to have an event-handler for this since other code could need to be part of the sync trafic, raid5 private parity caches could be one example. I have not done this yet, since currently softupdates is the only client. The fsync item really doesn't belong in the fsync system call, it belongs in ffs_fsync, and has been moved there. If it had been possible to put the fsync call in ffs_fsync, I would have done that. Unfortunately, it is not possible and will hang or panic the kernel if you put it there. The problem is that ffs_fsync syncs out the data blocks of the associated file. The bioops call to soft updates requests that any names associated with the file being sync'ed be sync'ed to disk as well. That is a necessary semantic of the system call fsync. However, it is not needed by most clients of VOP_FSYNC. Because the sync'ing of the name requires a walk up the filesystem tree from the inode in question to the root of the filesystem, the locking protocol requires that the nodes lower in the tree be unlocked before locking nodes that are higher. This means that the vnode being fsync'ed must be briefly unlocked while its containing parent is locked. If the vnode being fsync'ed is a directory, this creates a window where another process can sneak in and make changes which leads to the panics, two entries with the same name, etc. This window is not a problem for the fsync call because it is not creating a new name, but it is a problem if VOP_FSYNC is called in the open, link, mkdir, etc paths (as it will be in for example if a block allocation fails due to the filesystem being full. Thus there are two choices: put the code back as it was or chance the VOP_FSYNC call interface to add a flags value that indicates whether the name needs to be syned out as well as the data. I chose the former as I did not want to disrupt a widely used interface. To give the right behaviour when SOFTUPDATES is not compiled in, stubs for both of these functions have been added to ffs_softdep_stub.c Finally all the checks to see if the bioops vector is populated has been centralized in in-line functions in <sys/buf.h> thereby paving the road for the global bioops to become bp->b_ops. Comments, reviews, tests please Poul-Henning Index: contrib/softupdates/ffs_softdep.c =================================================================== RCS file: /home/ncvs/src/sys/contrib/softupdates/ffs_softdep.c,v retrieving revision 1.64 diff -u -r1.64 ffs_softdep.c --- contrib/softupdates/ffs_softdep.c 2000/05/26 02:01:59 1.64 +++ contrib/softupdates/ffs_softdep.c 2000/06/14 19:26:46 @@ -222,8 +222,6 @@ softdep_disk_io_initiation, /* io_start */ softdep_disk_write_complete, /* io_complete */ softdep_deallocate_dependencies, /* io_deallocate */ - softdep_fsync, /* io_fsync */ - softdep_process_worklist, /* io_sync */ softdep_move_dependencies, /* io_movedeps */ softdep_count_dependencies, /* io_countdeps */ }; Index: kern/vfs_bio.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.258 diff -u -r1.258 vfs_bio.c --- kern/vfs_bio.c 2000/05/26 02:04:39 1.258 +++ kern/vfs_bio.c 2000/06/14 19:00:56 @@ -616,8 +616,8 @@ newbp->b_flags &= ~B_INVAL; /* move over the dependencies */ - if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_movedeps) - (*bioops.io_movedeps)(bp, newbp); + if (LIST_FIRST(&bp->b_dep) != NULL) + buf_movedeps(bp, newbp); /* * Initiate write on the copy, release the original to @@ -673,10 +673,10 @@ /* * Process dependencies then return any unfinished ones. */ - if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_complete) - (*bioops.io_complete)(bp); - if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_movedeps) - (*bioops.io_movedeps)(bp, origbp); + if (LIST_FIRST(&bp->b_dep) != NULL) + buf_complete(bp); + if (LIST_FIRST(&bp->b_dep) != NULL) + buf_movedeps(bp, origbp); /* * Clear the BX_BKGRDINPROG flag in the original buffer * and awaken it if it is waiting for the write to complete. @@ -939,8 +939,8 @@ * cache the buffer. */ bp->b_flags |= B_INVAL; - if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_deallocate) - (*bioops.io_deallocate)(bp); + if (LIST_FIRST(&bp->b_dep) != NULL) + buf_deallocate(bp); if (bp->b_flags & B_DELWRI) { --numdirtybuffers; numdirtywakeup(); @@ -1570,8 +1570,8 @@ crfree(bp->b_wcred); bp->b_wcred = NOCRED; } - if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_deallocate) - (*bioops.io_deallocate)(bp); + if (LIST_FIRST(&bp->b_dep) != NULL) + buf_deallocate(bp); if (bp->b_xflags & BX_BKGRDINPROG) panic("losing buffer 3"); LIST_REMOVE(bp, b_hash); @@ -1848,9 +1848,8 @@ break; } if (LIST_FIRST(&bp->b_dep) != NULL && - bioops.io_countdeps && (bp->b_flags & B_DEFERRED) == 0 && - (*bioops.io_countdeps)(bp, 0)) { + buf_countdeps(bp, 0)) { TAILQ_REMOVE(&bufqueues[QUEUE_DIRTY], bp, b_freelist); TAILQ_INSERT_TAIL(&bufqueues[QUEUE_DIRTY], @@ -2664,8 +2663,8 @@ splx(s); return; } - if (LIST_FIRST(&bp->b_dep) != NULL && bioops.io_complete) - (*bioops.io_complete)(bp); + if (LIST_FIRST(&bp->b_dep) != NULL) + buf_complete(bp); if (bp->b_flags & B_VMIO) { int i, resid; Index: kern/vfs_cluster.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v retrieving revision 1.98 diff -u -r1.98 vfs_cluster.c --- kern/vfs_cluster.c 2000/05/05 09:58:25 1.98 +++ kern/vfs_cluster.c 2000/06/14 19:01:13 @@ -805,9 +805,8 @@ splx(s); } /* end of code for non-first buffers only */ /* check for latent dependencies to be handled */ - if ((LIST_FIRST(&tbp->b_dep)) != NULL && - bioops.io_start) - (*bioops.io_start)(tbp); + if ((LIST_FIRST(&tbp->b_dep)) != NULL) + buf_start(tbp); /* * If the IO is via the VM then we do some * special VM hackery. (yuck) Index: kern/vfs_subr.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.257 diff -u -r1.257 vfs_subr.c --- kern/vfs_subr.c 2000/05/26 02:04:39 1.257 +++ kern/vfs_subr.c 2000/06/14 19:56:33 @@ -1029,8 +1029,7 @@ /* * Do soft update processing. */ - if (bioops.io_sync) - (*bioops.io_sync)(NULL); + softdep_process_worklist(NULL); /* * The variable rushjob allows the kernel to speed up the Index: kern/vfs_syscalls.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.153 diff -u -r1.153 vfs_syscalls.c --- kern/vfs_syscalls.c 2000/05/05 09:58:27 1.153 +++ kern/vfs_syscalls.c 2000/06/14 19:38:24 @@ -2545,10 +2545,7 @@ vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, p); if (vp->v_object) vm_object_page_clean(vp->v_object, 0, 0, 0); - if ((error = VOP_FSYNC(vp, fp->f_cred, MNT_WAIT, p)) == 0 && - vp->v_mount && (vp->v_mount->mnt_flag & MNT_SOFTDEP) && - bioops.io_fsync) - error = (*bioops.io_fsync)(vp); + error = VOP_FSYNC(vp, fp->f_cred, MNT_WAIT, p); VOP_UNLOCK(vp, 0, p); return (error); } Index: miscfs/devfs/devfs_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/miscfs/devfs/devfs_vnops.c,v retrieving revision 1.96 diff -u -r1.96 devfs_vnops.c --- miscfs/devfs/devfs_vnops.c 2000/05/05 09:58:29 1.96 +++ miscfs/devfs/devfs_vnops.c 2000/06/14 19:02:35 @@ -1570,9 +1570,8 @@ return error; - if ((bp->b_iocmd == BIO_WRITE) && - (LIST_FIRST(&bp->b_dep)) != NULL && bioops.io_start) - (*bioops.io_start)(bp); + if ((bp->b_iocmd == BIO_WRITE) && (LIST_FIRST(&bp->b_dep)) != NULL) + buf_start(bp); switch (vp->v_type) { case VCHR: (*vp->v_rdev->si_devsw->d_strategy)(&bp->b_io); Index: miscfs/specfs/spec_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/miscfs/specfs/spec_vnops.c,v retrieving revision 1.138 diff -u -r1.138 spec_vnops.c --- miscfs/specfs/spec_vnops.c 2000/06/12 10:20:18 1.138 +++ miscfs/specfs/spec_vnops.c 2000/06/14 19:02:49 @@ -417,9 +417,8 @@ struct mount *mp; bp = ap->a_bp; - if ((bp->b_iocmd == BIO_WRITE) && - (LIST_FIRST(&bp->b_dep)) != NULL && bioops.io_start) - (*bioops.io_start)(bp); + if ((bp->b_iocmd == BIO_WRITE) && (LIST_FIRST(&bp->b_dep)) != NULL) + buf_start(bp); /* * Collect statistics on synchronous and asynchronous read Index: sys/buf.h =================================================================== RCS file: /home/ncvs/src/sys/sys/buf.h,v retrieving revision 1.103 diff -u -r1.103 buf.h --- sys/buf.h 2000/05/26 02:06:53 1.103 +++ sys/buf.h 2000/06/14 19:26:59 @@ -64,8 +64,6 @@ void (*io_start) __P((struct buf *)); void (*io_complete) __P((struct buf *)); void (*io_deallocate) __P((struct buf *)); - int (*io_fsync) __P((struct vnode *)); - int (*io_sync) __P((struct mount *)); void (*io_movedeps) __P((struct buf *, struct buf *)); int (*io_countdeps) __P((struct buf *, int)); } bioops; @@ -405,6 +403,43 @@ #define BUF_WRITE(bp) VOP_BWRITE((bp)->b_vp, (bp)) #define BUF_STRATEGY(bp) VOP_STRATEGY((bp)->b_vp, (bp)) + +static __inline void +buf_start(struct buf *bp) +{ + if (bioops.io_start) + (*bioops.io_start)(bp); +} + +static __inline void +buf_complete(struct buf *bp) +{ + if (bioops.io_complete) + (*bioops.io_complete)(bp); +} + +static __inline void +buf_deallocate(struct buf *bp) +{ + if (bioops.io_deallocate) + (*bioops.io_deallocate)(bp); +} + +static __inline void +buf_movedeps(struct buf *bp, struct buf *bp2) +{ + if (bioops.io_movedeps) + (*bioops.io_movedeps)(bp, bp2); +} + +static __inline int +buf_countdeps(struct buf *bp, int i) +{ + if (bioops.io_countdeps) + return ((*bioops.io_countdeps)(bp, i)); + else + return (0); +} #endif /* _KERNEL */ Index: sys/mount.h =================================================================== RCS file: /home/ncvs/src/sys/sys/mount.h,v retrieving revision 1.91 diff -u -r1.91 mount.h --- sys/mount.h 2000/05/26 02:06:55 1.91 +++ sys/mount.h 2000/06/14 19:58:22 @@ -447,6 +447,7 @@ int vfs_stdextattrctl __P((struct mount *mp, int cmd, const char *attrname, caddr_t arg, struct proc *p)); +void softdep_process_worklist __P((struct mount *)); #else /* !_KERNEL */ #include <sys/cdefs.h> Index: ufs/ffs/ffs_softdep_stub.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_softdep_stub.c,v retrieving revision 1.8 diff -u -r1.8 ffs_softdep_stub.c --- ufs/ffs/ffs_softdep_stub.c 2000/04/19 14:58:25 1.8 +++ ufs/ffs/ffs_softdep_stub.c 2000/06/14 19:33:11 @@ -254,4 +254,20 @@ return (0); } + +int +softdep_fsync(vp) + struct vnode *vp; /* the "in_core" copy of the inode */ +{ + + return (0); +} + +int +softdep_process_worklist(matchmnt) + struct mount *matchmnt; +{ + return (0); +} + #endif /* SOFTUPDATES not configured in */ Index: ufs/ffs/ffs_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vnops.c,v retrieving revision 1.66 diff -u -r1.66 ffs_vnops.c --- ufs/ffs/ffs_vnops.c 2000/05/05 09:59:05 1.66 +++ ufs/ffs/ffs_vnops.c 2000/06/14 19:38:13 @@ -175,7 +175,7 @@ continue; if (!wait && LIST_FIRST(&bp->b_dep) != NULL && (bp->b_flags & B_DEFERRED) == 0 && - bioops.io_countdeps && (*bioops.io_countdeps)(bp, 0)) { + buf_countdeps(bp, 0)) { bp->b_flags |= B_DEFERRED; continue; } @@ -278,5 +278,8 @@ } } splx(s); - return (UFS_UPDATE(vp, wait)); + error = UFS_UPDATE(vp, wait); + if (error == 0 && vp->v_mount && (vp->v_mount->mnt_flag & MNT_SOFTDEP)) + error = softdep_fsync(vp); + return (error); } -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message ------- End of Forwarded Message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200006170019.RAA07992>