From owner-freebsd-hackers Tue Apr 4 13:36:33 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id 4935B37B77C for ; Tue, 4 Apr 2000 13:36:29 -0700 (PDT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id QAA11027 for ; Tue, 4 Apr 2000 16:36:28 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.9.3/8.9.1) id QAA27385; Tue, 4 Apr 2000 16:35:57 -0400 (EDT) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Tue, 4 Apr 2000 16:35:57 -0400 (EDT) To: freebsd-hackers@freebsd.org Subject: reducing the number of NFSv3 commit ops X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14570.10864.359054.10598@grasshopper.cs.duke.edu> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Currently FreeBSD issues a very large number of NFSv3 commit rpcs when writing a sequential file. They average out to about one every 64k or so. Solaris, on the other hand, issues only a handful. At least when running against a Solaris NFS server, these frequent commits really kill our write bandwidth. The commits are initiated out of the bufdaemon: nfs_commit(e06866c0,360000,0,10000,c8aa5e00) at nfs_commit+0x52a nfs_doio(d3088158,c8aa5e00,0,d3088158,40084040) at nfs_doio+0x371 nfs_strategy(ddef1ec0) at nfs_strategy+0x68 nfs_writebp(d3088158,1,ddee5920,ddef1ef8,c0180e42) at nfs_writebp+0xdc nfs_bwrite(ddef1eec,c02a15c0,e06866c0,d3088158,ddef1f28) at nfs_bwrite+0x16 bawrite(d3088158,d30faff0,0,40084040,d30fbae8) at bawrite+0x32 cluster_wbuild(e06866c0,2000,1b8,10,d30fc328) at cluster_wbuild+0x493 vfs_bio_awrite(d30fc328,3f,c0181f8c,c016aef5,0) at vfs_bio_awrite+0x1a4 flushbufqueues(0,8000,c024be00,0,b0206) at flushbufqueues+0x116 buf_daemon(0) at buf_daemon+0x8f fork_trampoline() at fork_trampoline+0x8 The "problem" is that flushbufqueues calls vfs_bio_awrite on the buf's that need commiting. We then go through the overhead of clustering up 64k worth of data & pass it down. It eventually ends up in nfs_doio() which finally realizes that the bufs just need to be committed & calls nfs_commit() on them. This is repeated for every 64k of data. I have an idea on how to reduce these commits & a proof of concept implementation of it. My idea is to have nfs_doio() call a function (which I've called nfs_megacommit()) to consolodate all the B_NEEDCOMMIT bufs from a particular file into one large commit. This nfs_megacommit() function is basically a cut-n-paste of the top half of nfs_flush(). I just tried it this morning & it appears to work. Over a 1Gb/s (Alteon, Jumbo frames) link, my write bandwidth increases from 5-8MB/sec to 17-18MB/sec when talking to a Solaris (2.7, i86) NFS server & writing a 375MB file. The server's nfsstat looks like this. Before: Version 3: (54262 calls) null getattr setattr lookup access readlink 0 0% 0 0% 1 0% 1 0% 3 0% 0 0% read write create mkdir symlink mknod 0 0% 48325 89% 0 0% 0 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% fsstat fsinfo pathconf commit 0 0% 0 0% 0 0% 5932 10% After: Version 3: (48078 calls) null getattr setattr lookup access readlink 0 0% 0 0% 0 0% 1 0% 1 0% 0 0% read write create mkdir symlink mknod 0 0% 48027 99% 1 0% 0 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% fsstat fsinfo pathconf commit 0 0% 0 0% 0 0% 48 0% Can anybody tell me if doing something like this is fundamentally broken? Is it worth pursuing? Thanks, Drew ------------------------------------------------------------------------------ Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: gallatin@cs.duke.edu Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message