From owner-freebsd-current@FreeBSD.ORG Mon Nov 19 12:41:05 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5361516A47F for ; Mon, 19 Nov 2007 12:41:05 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2619413C467 for ; Mon, 19 Nov 2007 12:41:04 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id EDAB4471EE; Mon, 19 Nov 2007 07:43:16 -0500 (EST) Date: Mon, 19 Nov 2007 12:40:43 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kip Macy In-Reply-To: Message-ID: <20071119123926.A59049@fledge.watson.org> References: <20071118211131.7164edd8@ibook.sics.se> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="621616949-1321620828-1195476043=:59049" Cc: Bjorn Gronvall , freebsd-current@freebsd.org Subject: Re: Improving NFS write performance by a factor of 2. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Nov 2007 12:41:05 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --621616949-1321620828-1195476043=:59049 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Sun, 18 Nov 2007, Kip Macy wrote: > Could you do me a favor and submit this in the form of a PR and assign it= to=20 > me? I'm not the most appropriate person for this but the main NFS develo= per=20 > is no longer working on FreeBSD and I don't want to see this dropped. If you're thinking of Mohan, he only mostly worked on the client, not the= =20 server. Jeff Roberson would probably be the best person to assign this to,= as=20 he's worked most recently in the NFS server (pushing Giant off the VFS path= s=20 and cleaning up Giant-related locking, whereas I had pushed it down to VFS= =20 before VFS locking was done). Robert N M Watson Computer Laboratory University of Cambridge > > -Kip > > On Nov 18, 2007 12:11 PM, Bjorn Gronvall wrote: >> Hi, >> >> I'm not sure if people care about NFS write performance any longer but >> if you do, please read on. >> >> A problem with the current NFS server is that it does not cluster >> writes, this in turn leads to really poor sequential-write >> performance. >> >> By enabling write clustering NFS write performance goes from >> 26.6Mbyte/s to 54.3Mbyte/s or increases by a factor of 2. This is on a >> SATA disk with write caching enabled (hw.ata.wc=3D1). >> >> If write caching is disabled performance still goes up from 1.6Mbyte/s >> to 5.8Mbyte/s (or by a factor of 3.6). >> >> The attached patch (relative to current) makes the following changes: >> >> 1/ Rearrange the code so that the same code can be used to detect both >> sequential read and write access. >> >> 2/ Merge in updates from vfs_vnops.c::sequential_heuristic. >> >> 3/ Use double hashing in order to avoid hash-clustering in the nfsheur >> table. This change also makes it possible to reduce "try" from 32 >> to 8. >> >> 4/ Pack the nfsheur table more efficiently. >> >> 5/ Tolerate reordered RPCs to some small amount (initially suggested >> by Ellard and Seltzer). >> >> 6/ Back-off from sequential access rather than immediately switching to >> random access (Ellard and Seltzer). >> >> 7/ To avoid starvation of the buffer pool call bwillwrite. The call is >> issued after the VOP_WRITE in order to avoid additional reordering >> of write operations. >> >> 8/ sysctl variables vfs.nfsrv.cluster_writes and cluster_reads to >> enable or disable clustering. vfs.nfsrv.reordered_io counts the >> number of reordered RPCs. >> >> 9/ In nfsrv_commit check for write errors and report them back to the >> client. Also check if the RPC argument count is zero which means >> that we must flush to the end of file according to the RFC. >> >> 10/ Two earlier commits broke the write gathering support: >> >> nfs_syscalls.c:1.71 >> >> This change removed NQNFS stuff but left the NQNFS variable >> notstarted. This resulted in NFS write gathering effectively >> being permanently disabled (regardless if NFSv2 or NFSv3). >> >> nfs_syscalls.c:1.103 >> >> This change disabled write gathering (again) for NFSv3 although >> this should be controlled by vfs.nfs.nfsrvw_procrastinate_v3 !=3D >> 0. >> >> Write gathering may still be useful with NFSv3 to put reordered write >> RPCs into order, perhaps also for other reasons. This is now possible >> again. >> >> The attached patch is for current but you will observe similar >> improvements with earlier FreeBSD versions. If you would like to have >> the same patch but for FreeBSD 5.x, 6.x or 7.0 please drop me a line. >> >> Cheers, >> /b >> >> >> -- >> _ _ ,_______________. >> Bjorn Gronvall (Bj=F6rn Gr=F6nvall) /_______________/= | >> Swedish Institute of Computer Science | || >> PO Box 1263, S-164 29 Kista, Sweden | Schroedingers || >> Email: bg@sics.se, Phone +46 -8 633 15 25 | Cat |/ >> Cellular +46 -70 768 06 35, Fax +46 -8 751 72 30 '---------------' >> >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.or= g" >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " > --621616949-1321620828-1195476043=:59049--