Skip site navigation (1)Skip section navigation (2)
Date:      Tue,  4 Apr 2000 16:35:57 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        freebsd-hackers@freebsd.org
Subject:   reducing the number of NFSv3 commit ops
Message-ID:  <14570.10864.359054.10598@grasshopper.cs.duke.edu>

next in thread | raw e-mail | index | archive | help

Currently FreeBSD issues a very large number of NFSv3 commit rpcs when
writing a sequential file.  They average out to about one every 64k or
so.  Solaris, on the other hand, issues only a handful.

At least when running against a Solaris NFS server, these
frequent commits really kill our write bandwidth.

The commits are initiated out of the bufdaemon:

nfs_commit(e06866c0,360000,0,10000,c8aa5e00) at nfs_commit+0x52a
nfs_doio(d3088158,c8aa5e00,0,d3088158,40084040) at nfs_doio+0x371
nfs_strategy(ddef1ec0) at nfs_strategy+0x68
nfs_writebp(d3088158,1,ddee5920,ddef1ef8,c0180e42) at nfs_writebp+0xdc
nfs_bwrite(ddef1eec,c02a15c0,e06866c0,d3088158,ddef1f28) at nfs_bwrite+0x16
bawrite(d3088158,d30faff0,0,40084040,d30fbae8) at bawrite+0x32
cluster_wbuild(e06866c0,2000,1b8,10,d30fc328) at cluster_wbuild+0x493
vfs_bio_awrite(d30fc328,3f,c0181f8c,c016aef5,0) at vfs_bio_awrite+0x1a4
flushbufqueues(0,8000,c024be00,0,b0206) at flushbufqueues+0x116
buf_daemon(0) at buf_daemon+0x8f
fork_trampoline() at fork_trampoline+0x8

The "problem" is that flushbufqueues calls vfs_bio_awrite on the buf's 
that need commiting.  We then go through the overhead of clustering up 
64k worth of data & pass it down.  It eventually ends up in nfs_doio()
which finally realizes that the bufs just need to be committed & calls 
nfs_commit() on them.  This is repeated for every 64k of data. 

I have an idea on how to reduce these commits & a proof of concept
implementation of it.  My idea is to have nfs_doio() call a function
(which I've called nfs_megacommit()) to consolodate all the
B_NEEDCOMMIT bufs from a particular file into one large commit.  This
nfs_megacommit() function is basically a cut-n-paste of the top half
of nfs_flush().

I just tried it this morning & it appears to work.  Over a 1Gb/s
(Alteon, Jumbo frames) link, my write bandwidth increases from
5-8MB/sec to 17-18MB/sec when talking to a Solaris (2.7, i86) NFS
server & writing a 375MB file.  The server's nfsstat looks like this.

Before:

Version 3: (54262 calls)
null        getattr     setattr     lookup      access      readlink    
0 0%        0 0%        1 0%        1 0%        3 0%        0 0%        
read        write       create      mkdir       symlink     mknod       
0 0%        48325 89%   0 0%        0 0%        0 0%        0 0%        
remove      rmdir       rename      link        readdir     readdirplus 
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%        
fsstat      fsinfo      pathconf    commit      
0 0%        0 0%        0 0%        5932 10%    


After:

Version 3: (48078 calls)
null        getattr     setattr     lookup      access      readlink    
0 0%        0 0%        0 0%        1 0%        1 0%        0 0%        
read        write       create      mkdir       symlink     mknod       
0 0%        48027 99%   1 0%        0 0%        0 0%        0 0%        
remove      rmdir       rename      link        readdir     readdirplus 
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%        
fsstat      fsinfo      pathconf    commit      
0 0%        0 0%        0 0%        48 0%       


Can anybody tell me if doing something like this is fundamentally
broken?  Is it worth pursuing?

Thanks,

Drew

------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer	http://www.cs.duke.edu/~gallatin
Duke University				Email: gallatin@cs.duke.edu
Department of Computer Science		Phone: (919) 660-6590


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14570.10864.359054.10598>