Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Apr 2000 14:16:41 -0700
From:      Alfred Perlstein <bright@wintelcom.net>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: reducing the number of NFSv3 commit ops
Message-ID:  <20000404141641.P20770@fw.wintelcom.net>
In-Reply-To: <14570.10864.359054.10598@grasshopper.cs.duke.edu>; from gallatin@cs.duke.edu on Tue, Apr 04, 2000 at 04:35:57PM -0400
References:  <14570.10864.359054.10598@grasshopper.cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
* Andrew Gallatin <gallatin@cs.duke.edu> [000404 14:03] wrote:
> 
> Currently FreeBSD issues a very large number of NFSv3 commit rpcs when
> writing a sequential file.  They average out to about one every 64k or
> so.  Solaris, on the other hand, issues only a handful.
> 
> At least when running against a Solaris NFS server, these
> frequent commits really kill our write bandwidth.
> 
> The commits are initiated out of the bufdaemon:
> 
> nfs_commit(e06866c0,360000,0,10000,c8aa5e00) at nfs_commit+0x52a
> nfs_doio(d3088158,c8aa5e00,0,d3088158,40084040) at nfs_doio+0x371
> nfs_strategy(ddef1ec0) at nfs_strategy+0x68
> nfs_writebp(d3088158,1,ddee5920,ddef1ef8,c0180e42) at nfs_writebp+0xdc
> nfs_bwrite(ddef1eec,c02a15c0,e06866c0,d3088158,ddef1f28) at nfs_bwrite+0x16
> bawrite(d3088158,d30faff0,0,40084040,d30fbae8) at bawrite+0x32
> cluster_wbuild(e06866c0,2000,1b8,10,d30fc328) at cluster_wbuild+0x493
> vfs_bio_awrite(d30fc328,3f,c0181f8c,c016aef5,0) at vfs_bio_awrite+0x1a4
> flushbufqueues(0,8000,c024be00,0,b0206) at flushbufqueues+0x116
> buf_daemon(0) at buf_daemon+0x8f
> fork_trampoline() at fork_trampoline+0x8
> 
> The "problem" is that flushbufqueues calls vfs_bio_awrite on the buf's 
> that need commiting.  We then go through the overhead of clustering up 
> 64k worth of data & pass it down.  It eventually ends up in nfs_doio()
> which finally realizes that the bufs just need to be committed & calls 
> nfs_commit() on them.  This is repeated for every 64k of data. 
> 
> I have an idea on how to reduce these commits & a proof of concept
> implementation of it.  My idea is to have nfs_doio() call a function
> (which I've called nfs_megacommit()) to consolodate all the
> B_NEEDCOMMIT bufs from a particular file into one large commit.  This
> nfs_megacommit() function is basically a cut-n-paste of the top half
> of nfs_flush().
> 
> I just tried it this morning & it appears to work.  Over a 1Gb/s
> (Alteon, Jumbo frames) link, my write bandwidth increases from
> 5-8MB/sec to 17-18MB/sec when talking to a Solaris (2.7, i86) NFS
> server & writing a 375MB file.  The server's nfsstat looks like this.
> 
> Before:
> 
> Version 3: (54262 calls)
> null        getattr     setattr     lookup      access      readlink    
> 0 0%        0 0%        1 0%        1 0%        3 0%        0 0%        
> read        write       create      mkdir       symlink     mknod       
> 0 0%        48325 89%   0 0%        0 0%        0 0%        0 0%        
> remove      rmdir       rename      link        readdir     readdirplus 
> 0 0%        0 0%        0 0%        0 0%        0 0%        0 0%        
> fsstat      fsinfo      pathconf    commit      
> 0 0%        0 0%        0 0%        5932 10%    
> 
> 
> After:
> 
> Version 3: (48078 calls)
> null        getattr     setattr     lookup      access      readlink    
> 0 0%        0 0%        0 0%        1 0%        1 0%        0 0%        
> read        write       create      mkdir       symlink     mknod       
> 0 0%        48027 99%   1 0%        0 0%        0 0%        0 0%        
> remove      rmdir       rename      link        readdir     readdirplus 
> 0 0%        0 0%        0 0%        0 0%        0 0%        0 0%        
> fsstat      fsinfo      pathconf    commit      
> 0 0%        0 0%        0 0%        48 0%       
> 
> 
> Can anybody tell me if doing something like this is fundamentally
> broken?  Is it worth pursuing?

http://www.freebsd.org/~alfred/nfs_supercommit_broken.diff

only grab as many adjacent blocks as possible, you don't want to
scan the entire file's buffer list for each commit, you also don't
want to interfere with other client's caching forcing sever commits
on thier behalf.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000404141641.P20770>