Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Nov 2007 12:40:43 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Kip Macy <kip.macy@gmail.com>
Cc:        Bjorn Gronvall <bg@sics.se>, freebsd-current@freebsd.org
Subject:   Re: Improving NFS write performance by a factor of 2.
Message-ID:  <20071119123926.A59049@fledge.watson.org>
In-Reply-To: <b1fa29170711181528jb88326bl4747a6cefb436288@mail.gmail.com>
References:  <20071118211131.7164edd8@ibook.sics.se> <b1fa29170711181528jb88326bl4747a6cefb436288@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--621616949-1321620828-1195476043=:59049
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


On Sun, 18 Nov 2007, Kip Macy wrote:

> Could you do me a favor and submit this in the form of a PR and assign it=
 to=20
> me?  I'm not the most appropriate person for this but the main NFS develo=
per=20
> is no longer working on FreeBSD and I don't want to see this dropped.

If you're thinking of Mohan, he only mostly worked on the client, not the=
=20
server.  Jeff Roberson would probably be the best person to assign this to,=
 as=20
he's worked most recently in the NFS server (pushing Giant off the VFS path=
s=20
and cleaning up Giant-related locking, whereas I had pushed it down to VFS=
=20
before VFS locking was done).

Robert N M Watson
Computer Laboratory
University of Cambridge

>
>      -Kip
>
> On Nov 18, 2007 12:11 PM, Bjorn Gronvall <bg@sics.se> wrote:
>> Hi,
>>
>> I'm not sure if people care about NFS write performance any longer but
>> if you do, please read on.
>>
>> A problem with the current NFS server is that it does not cluster
>> writes, this in turn leads to really poor sequential-write
>> performance.
>>
>> By enabling write clustering NFS write performance goes from
>> 26.6Mbyte/s to 54.3Mbyte/s or increases by a factor of 2. This is on a
>> SATA disk with write caching enabled (hw.ata.wc=3D1).
>>
>> If write caching is disabled performance still goes up from 1.6Mbyte/s
>> to 5.8Mbyte/s (or by a factor of 3.6).
>>
>> The attached patch (relative to current) makes the following changes:
>>
>> 1/ Rearrange the code so that the same code can be used to detect both
>>    sequential read and write access.
>>
>> 2/ Merge in updates from vfs_vnops.c::sequential_heuristic.
>>
>> 3/ Use double hashing in order to avoid hash-clustering in the nfsheur
>>    table. This change also makes it possible to reduce "try" from 32
>>    to 8.
>>
>> 4/ Pack the nfsheur table more efficiently.
>>
>> 5/ Tolerate reordered RPCs to some small amount (initially suggested
>>    by Ellard and Seltzer).
>>
>> 6/ Back-off from sequential access rather than immediately switching to
>>    random access (Ellard and Seltzer).
>>
>> 7/ To avoid starvation of the buffer pool call bwillwrite. The call is
>>    issued after the VOP_WRITE in order to avoid additional reordering
>>    of write operations.
>>
>> 8/ sysctl variables vfs.nfsrv.cluster_writes and cluster_reads to
>>    enable or disable clustering. vfs.nfsrv.reordered_io counts the
>>    number of reordered RPCs.
>>
>> 9/ In nfsrv_commit check for write errors and report them back to the
>>    client. Also check if the RPC argument count is zero which means
>>    that we must flush to the end of file according to the RFC.
>>
>> 10/ Two earlier commits broke the write gathering support:
>>
>>     nfs_syscalls.c:1.71
>>
>>       This change removed NQNFS stuff but left the NQNFS variable
>>       notstarted. This resulted in NFS write gathering effectively
>>       being permanently disabled (regardless if NFSv2 or NFSv3).
>>
>>     nfs_syscalls.c:1.103
>>
>>       This change disabled write gathering (again) for NFSv3 although
>>       this should be controlled by vfs.nfs.nfsrvw_procrastinate_v3 !=3D
>>       0.
>>
>> Write gathering may still be useful with NFSv3 to put reordered write
>> RPCs into order, perhaps also for other reasons. This is now possible
>> again.
>>
>> The attached patch is for current but you will observe similar
>> improvements with earlier FreeBSD versions. If you would like to have
>> the same patch but for FreeBSD 5.x, 6.x or 7.0 please drop me a line.
>>
>> Cheers,
>> /b
>>
>>
>> --
>>   _     _                                           ,_______________.
>> Bjorn Gronvall (Bj=F6rn Gr=F6nvall)                    /_______________/=
|
>> Swedish Institute of Computer Science              |               ||
>> PO Box 1263, S-164 29 Kista, Sweden                | Schroedingers ||
>> Email: bg@sics.se, Phone +46 -8 633 15 25          |      Cat      |/
>> Cellular +46 -70 768 06 35, Fax +46 -8 751 72 30   '---------------'
>>
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.or=
g"
>>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org=
"
>
--621616949-1321620828-1195476043=:59049--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071119123926.A59049>