Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Jul 1996 14:13:45 +0100 (BST)
From:      Doug Rabson <dfr@render.com>
To:        current@freebsd.org
Subject:   Using clustered writes for NFSv3
Message-ID:  <Pine.BSI.3.95.960723135805.12996K-100000@minnow.render.com>

next in thread | raw e-mail | index | archive | help
In NFSv3, when a buffer is written to a server, normally the server
doesn't write the data to stable storage immediately.  Instead, it informs
the client that the data is written but 'unstable'.  The client can later
commit the data to stable storage in a seperate call. 

In FreeBSD's (and anyone else who is using Rick Macklem's code)
implementation of NFSv3, this is implemented by turning the buffer into a
delayed write buffer, marked with B_DELWRI|B_NEEDCOMMIT.  When the buffer
is later recycled by the buffer cache or explicitly synced with VOP_FSYNC,
the nfs client code notices the B_NEEDCOMMIT flag and performs the
appropriate commit call.

If a lot of unstable writes are made with NFSv3, this commit operation
tends to be done via vfs_bio_awrite when the buffer is recycled.  This
leads to a large number of more-or-less sequential commit operations which
could be combined into one request.  In particular for an IRIX 5.3 server,
this causes very poor performance copying large files over NFSv3.

By a stroke of luck, the existing code for clustered writes in vfs_bio.c
and vfs_cluster.c can be made to work for NFSv3 as well.  I needed to
tweak cluster_wbuild a little to copy over the B_NEEDCOMMIT flag and to
set the b_dirtyoff and b_dirtyend fields of the cluster appropriately.

Is there any chance I can get this change into -current?  The only
possible problem with the patch that I can see is the part which copies
over the write credentials (required for NFS) to the new buffer. It seems
to improve the write performance against an SGI fileserver by a factor of
two (with some other NFS patches not included here). 

Index: vfs_cluster.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v
retrieving revision 1.36
diff -c -r1.36 vfs_cluster.c
*** vfs_cluster.c	1996/06/03 04:40:35	1.36
--- vfs_cluster.c	1996/07/23 11:34:27
***************
*** 616,626 ****
  		bp->b_bcount = 0;
  		bp->b_bufsize = 0;
  		bp->b_npages = 0;
  
  		bp->b_blkno = tbp->b_blkno;
  		bp->b_lblkno = tbp->b_lblkno;
  		(vm_offset_t) bp->b_data |= ((vm_offset_t) tbp->b_data) & PAGE_MASK;
! 		bp->b_flags |= B_CALL | B_BUSY | B_CLUSTER | (tbp->b_flags & B_VMIO);
  		bp->b_iodone = cluster_callback;
  		pbgetvp(vp, bp);
  
--- 616,630 ----
  		bp->b_bcount = 0;
  		bp->b_bufsize = 0;
  		bp->b_npages = 0;
+ 		if (tbp->b_wcred != NOCRED) {
+ 		    bp->b_wcred = tbp->b_wcred;
+ 		    crhold(bp->b_wcred);
+ 		}
  
  		bp->b_blkno = tbp->b_blkno;
  		bp->b_lblkno = tbp->b_lblkno;
  		(vm_offset_t) bp->b_data |= ((vm_offset_t) tbp->b_data) & PAGE_MASK;
! 		bp->b_flags |= B_CALL | B_BUSY | B_CLUSTER | (tbp->b_flags & (B_VMIO|B_NEEDCOMMIT));
  		bp->b_iodone = cluster_callback;
  		pbgetvp(vp, bp);
  
***************
*** 632,638 ****
  					break;
  				}
  
! 				if ((tbp->b_flags & (B_VMIO|B_CLUSTEROK|B_INVAL|B_BUSY|B_DELWRI)) != (B_DELWRI|B_CLUSTEROK|(bp->b_flags & B_VMIO))) {
  					splx(s);
  					break;
  				}
--- 636,647 ----
  					break;
  				}
  
! 				if ((tbp->b_flags & (B_VMIO|B_CLUSTEROK|B_INVAL|B_BUSY|B_DELWRI|B_NEEDCOMMIT)) != (B_DELWRI|B_CLUSTEROK|(bp->b_flags & (B_VMIO|B_NEEDCOMMIT)))) {
! 					splx(s);
! 					break;
! 				}
! 
! 				if (tbp->b_wcred != bp->b_wcred) {
  					splx(s);
  					break;
  				}
***************
*** 676,681 ****
--- 685,692 ----
  		pmap_qenter(trunc_page((vm_offset_t) bp->b_data),
  			(vm_page_t *) bp->b_pages, bp->b_npages);
  		totalwritten += bp->b_bufsize;
+ 		bp->b_dirtyoff = 0;
+ 		bp->b_dirtyend = bp->b_bufsize;
  		bawrite(bp);
  
  		len -= i;


--
Doug Rabson, Microsoft RenderMorphics Ltd.	Mail:  dfr@render.com
						Phone: +44 171 251 4411
						FAX:   +44 171 251 0939




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSI.3.95.960723135805.12996K-100000>