From owner-freebsd-current Tue Jul 23 09:42:34 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id JAA25315 for current-outgoing; Tue, 23 Jul 1996 09:42:34 -0700 (PDT) Received: from minnow.render.com (render.demon.co.uk [158.152.30.118]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id JAA25308 for ; Tue, 23 Jul 1996 09:42:30 -0700 (PDT) Received: from localhost (dfr@localhost) by minnow.render.com (8.6.12/8.6.9) with SMTP id OAA16735 for ; Tue, 23 Jul 1996 14:13:46 +0100 Date: Tue, 23 Jul 1996 14:13:45 +0100 (BST) From: Doug Rabson To: current@freebsd.org Subject: Using clustered writes for NFSv3 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In NFSv3, when a buffer is written to a server, normally the server doesn't write the data to stable storage immediately. Instead, it informs the client that the data is written but 'unstable'. The client can later commit the data to stable storage in a seperate call. In FreeBSD's (and anyone else who is using Rick Macklem's code) implementation of NFSv3, this is implemented by turning the buffer into a delayed write buffer, marked with B_DELWRI|B_NEEDCOMMIT. When the buffer is later recycled by the buffer cache or explicitly synced with VOP_FSYNC, the nfs client code notices the B_NEEDCOMMIT flag and performs the appropriate commit call. If a lot of unstable writes are made with NFSv3, this commit operation tends to be done via vfs_bio_awrite when the buffer is recycled. This leads to a large number of more-or-less sequential commit operations which could be combined into one request. In particular for an IRIX 5.3 server, this causes very poor performance copying large files over NFSv3. By a stroke of luck, the existing code for clustered writes in vfs_bio.c and vfs_cluster.c can be made to work for NFSv3 as well. I needed to tweak cluster_wbuild a little to copy over the B_NEEDCOMMIT flag and to set the b_dirtyoff and b_dirtyend fields of the cluster appropriately. Is there any chance I can get this change into -current? The only possible problem with the patch that I can see is the part which copies over the write credentials (required for NFS) to the new buffer. It seems to improve the write performance against an SGI fileserver by a factor of two (with some other NFS patches not included here). Index: vfs_cluster.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v retrieving revision 1.36 diff -c -r1.36 vfs_cluster.c *** vfs_cluster.c 1996/06/03 04:40:35 1.36 --- vfs_cluster.c 1996/07/23 11:34:27 *************** *** 616,626 **** bp->b_bcount = 0; bp->b_bufsize = 0; bp->b_npages = 0; bp->b_blkno = tbp->b_blkno; bp->b_lblkno = tbp->b_lblkno; (vm_offset_t) bp->b_data |= ((vm_offset_t) tbp->b_data) & PAGE_MASK; ! bp->b_flags |= B_CALL | B_BUSY | B_CLUSTER | (tbp->b_flags & B_VMIO); bp->b_iodone = cluster_callback; pbgetvp(vp, bp); --- 616,630 ---- bp->b_bcount = 0; bp->b_bufsize = 0; bp->b_npages = 0; + if (tbp->b_wcred != NOCRED) { + bp->b_wcred = tbp->b_wcred; + crhold(bp->b_wcred); + } bp->b_blkno = tbp->b_blkno; bp->b_lblkno = tbp->b_lblkno; (vm_offset_t) bp->b_data |= ((vm_offset_t) tbp->b_data) & PAGE_MASK; ! bp->b_flags |= B_CALL | B_BUSY | B_CLUSTER | (tbp->b_flags & (B_VMIO|B_NEEDCOMMIT)); bp->b_iodone = cluster_callback; pbgetvp(vp, bp); *************** *** 632,638 **** break; } ! if ((tbp->b_flags & (B_VMIO|B_CLUSTEROK|B_INVAL|B_BUSY|B_DELWRI)) != (B_DELWRI|B_CLUSTEROK|(bp->b_flags & B_VMIO))) { splx(s); break; } --- 636,647 ---- break; } ! if ((tbp->b_flags & (B_VMIO|B_CLUSTEROK|B_INVAL|B_BUSY|B_DELWRI|B_NEEDCOMMIT)) != (B_DELWRI|B_CLUSTEROK|(bp->b_flags & (B_VMIO|B_NEEDCOMMIT)))) { ! splx(s); ! break; ! } ! ! if (tbp->b_wcred != bp->b_wcred) { splx(s); break; } *************** *** 676,681 **** --- 685,692 ---- pmap_qenter(trunc_page((vm_offset_t) bp->b_data), (vm_page_t *) bp->b_pages, bp->b_npages); totalwritten += bp->b_bufsize; + bp->b_dirtyoff = 0; + bp->b_dirtyend = bp->b_bufsize; bawrite(bp); len -= i; -- Doug Rabson, Microsoft RenderMorphics Ltd. Mail: dfr@render.com Phone: +44 171 251 4411 FAX: +44 171 251 0939