Date: Tue, 23 Jul 1996 14:13:45 +0100 (BST) From: Doug Rabson <dfr@render.com> To: current@freebsd.org Subject: Using clustered writes for NFSv3 Message-ID: <Pine.BSI.3.95.960723135805.12996K-100000@minnow.render.com>
next in thread | raw e-mail | index | archive | help
In NFSv3, when a buffer is written to a server, normally the server doesn't write the data to stable storage immediately. Instead, it informs the client that the data is written but 'unstable'. The client can later commit the data to stable storage in a seperate call. In FreeBSD's (and anyone else who is using Rick Macklem's code) implementation of NFSv3, this is implemented by turning the buffer into a delayed write buffer, marked with B_DELWRI|B_NEEDCOMMIT. When the buffer is later recycled by the buffer cache or explicitly synced with VOP_FSYNC, the nfs client code notices the B_NEEDCOMMIT flag and performs the appropriate commit call. If a lot of unstable writes are made with NFSv3, this commit operation tends to be done via vfs_bio_awrite when the buffer is recycled. This leads to a large number of more-or-less sequential commit operations which could be combined into one request. In particular for an IRIX 5.3 server, this causes very poor performance copying large files over NFSv3. By a stroke of luck, the existing code for clustered writes in vfs_bio.c and vfs_cluster.c can be made to work for NFSv3 as well. I needed to tweak cluster_wbuild a little to copy over the B_NEEDCOMMIT flag and to set the b_dirtyoff and b_dirtyend fields of the cluster appropriately. Is there any chance I can get this change into -current? The only possible problem with the patch that I can see is the part which copies over the write credentials (required for NFS) to the new buffer. It seems to improve the write performance against an SGI fileserver by a factor of two (with some other NFS patches not included here). Index: vfs_cluster.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v retrieving revision 1.36 diff -c -r1.36 vfs_cluster.c *** vfs_cluster.c 1996/06/03 04:40:35 1.36 --- vfs_cluster.c 1996/07/23 11:34:27 *************** *** 616,626 **** bp->b_bcount = 0; bp->b_bufsize = 0; bp->b_npages = 0; bp->b_blkno = tbp->b_blkno; bp->b_lblkno = tbp->b_lblkno; (vm_offset_t) bp->b_data |= ((vm_offset_t) tbp->b_data) & PAGE_MASK; ! bp->b_flags |= B_CALL | B_BUSY | B_CLUSTER | (tbp->b_flags & B_VMIO); bp->b_iodone = cluster_callback; pbgetvp(vp, bp); --- 616,630 ---- bp->b_bcount = 0; bp->b_bufsize = 0; bp->b_npages = 0; + if (tbp->b_wcred != NOCRED) { + bp->b_wcred = tbp->b_wcred; + crhold(bp->b_wcred); + } bp->b_blkno = tbp->b_blkno; bp->b_lblkno = tbp->b_lblkno; (vm_offset_t) bp->b_data |= ((vm_offset_t) tbp->b_data) & PAGE_MASK; ! bp->b_flags |= B_CALL | B_BUSY | B_CLUSTER | (tbp->b_flags & (B_VMIO|B_NEEDCOMMIT)); bp->b_iodone = cluster_callback; pbgetvp(vp, bp); *************** *** 632,638 **** break; } ! if ((tbp->b_flags & (B_VMIO|B_CLUSTEROK|B_INVAL|B_BUSY|B_DELWRI)) != (B_DELWRI|B_CLUSTEROK|(bp->b_flags & B_VMIO))) { splx(s); break; } --- 636,647 ---- break; } ! if ((tbp->b_flags & (B_VMIO|B_CLUSTEROK|B_INVAL|B_BUSY|B_DELWRI|B_NEEDCOMMIT)) != (B_DELWRI|B_CLUSTEROK|(bp->b_flags & (B_VMIO|B_NEEDCOMMIT)))) { ! splx(s); ! break; ! } ! ! if (tbp->b_wcred != bp->b_wcred) { splx(s); break; } *************** *** 676,681 **** --- 685,692 ---- pmap_qenter(trunc_page((vm_offset_t) bp->b_data), (vm_page_t *) bp->b_pages, bp->b_npages); totalwritten += bp->b_bufsize; + bp->b_dirtyoff = 0; + bp->b_dirtyend = bp->b_bufsize; bawrite(bp); len -= i; -- Doug Rabson, Microsoft RenderMorphics Ltd. Mail: dfr@render.com Phone: +44 171 251 4411 FAX: +44 171 251 0939
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSI.3.95.960723135805.12996K-100000>