Date: Mon, 19 Nov 2007 12:40:43 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Kip Macy <kip.macy@gmail.com> Cc: Bjorn Gronvall <bg@sics.se>, freebsd-current@freebsd.org Subject: Re: Improving NFS write performance by a factor of 2. Message-ID: <20071119123926.A59049@fledge.watson.org> In-Reply-To: <b1fa29170711181528jb88326bl4747a6cefb436288@mail.gmail.com> References: <20071118211131.7164edd8@ibook.sics.se> <b1fa29170711181528jb88326bl4747a6cefb436288@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --621616949-1321620828-1195476043=:59049 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Sun, 18 Nov 2007, Kip Macy wrote: > Could you do me a favor and submit this in the form of a PR and assign it= to=20 > me? I'm not the most appropriate person for this but the main NFS develo= per=20 > is no longer working on FreeBSD and I don't want to see this dropped. If you're thinking of Mohan, he only mostly worked on the client, not the= =20 server. Jeff Roberson would probably be the best person to assign this to,= as=20 he's worked most recently in the NFS server (pushing Giant off the VFS path= s=20 and cleaning up Giant-related locking, whereas I had pushed it down to VFS= =20 before VFS locking was done). Robert N M Watson Computer Laboratory University of Cambridge > > -Kip > > On Nov 18, 2007 12:11 PM, Bjorn Gronvall <bg@sics.se> wrote: >> Hi, >> >> I'm not sure if people care about NFS write performance any longer but >> if you do, please read on. >> >> A problem with the current NFS server is that it does not cluster >> writes, this in turn leads to really poor sequential-write >> performance. >> >> By enabling write clustering NFS write performance goes from >> 26.6Mbyte/s to 54.3Mbyte/s or increases by a factor of 2. This is on a >> SATA disk with write caching enabled (hw.ata.wc=3D1). >> >> If write caching is disabled performance still goes up from 1.6Mbyte/s >> to 5.8Mbyte/s (or by a factor of 3.6). >> >> The attached patch (relative to current) makes the following changes: >> >> 1/ Rearrange the code so that the same code can be used to detect both >> sequential read and write access. >> >> 2/ Merge in updates from vfs_vnops.c::sequential_heuristic. >> >> 3/ Use double hashing in order to avoid hash-clustering in the nfsheur >> table. This change also makes it possible to reduce "try" from 32 >> to 8. >> >> 4/ Pack the nfsheur table more efficiently. >> >> 5/ Tolerate reordered RPCs to some small amount (initially suggested >> by Ellard and Seltzer). >> >> 6/ Back-off from sequential access rather than immediately switching to >> random access (Ellard and Seltzer). >> >> 7/ To avoid starvation of the buffer pool call bwillwrite. The call is >> issued after the VOP_WRITE in order to avoid additional reordering >> of write operations. >> >> 8/ sysctl variables vfs.nfsrv.cluster_writes and cluster_reads to >> enable or disable clustering. vfs.nfsrv.reordered_io counts the >> number of reordered RPCs. >> >> 9/ In nfsrv_commit check for write errors and report them back to the >> client. Also check if the RPC argument count is zero which means >> that we must flush to the end of file according to the RFC. >> >> 10/ Two earlier commits broke the write gathering support: >> >> nfs_syscalls.c:1.71 >> >> This change removed NQNFS stuff but left the NQNFS variable >> notstarted. This resulted in NFS write gathering effectively >> being permanently disabled (regardless if NFSv2 or NFSv3). >> >> nfs_syscalls.c:1.103 >> >> This change disabled write gathering (again) for NFSv3 although >> this should be controlled by vfs.nfs.nfsrvw_procrastinate_v3 !=3D >> 0. >> >> Write gathering may still be useful with NFSv3 to put reordered write >> RPCs into order, perhaps also for other reasons. This is now possible >> again. >> >> The attached patch is for current but you will observe similar >> improvements with earlier FreeBSD versions. If you would like to have >> the same patch but for FreeBSD 5.x, 6.x or 7.0 please drop me a line. >> >> Cheers, >> /b >> >> >> -- >> _ _ ,_______________. >> Bjorn Gronvall (Bj=F6rn Gr=F6nvall) /_______________/= | >> Swedish Institute of Computer Science | || >> PO Box 1263, S-164 29 Kista, Sweden | Schroedingers || >> Email: bg@sics.se, Phone +46 -8 633 15 25 | Cat |/ >> Cellular +46 -70 768 06 35, Fax +46 -8 751 72 30 '---------------' >> >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.or= g" >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " > --621616949-1321620828-1195476043=:59049--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071119123926.A59049>