Date: Tue, 9 Jul 2013 19:57:02 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Garrett Wollman <wollman@bimajority.org> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Terrible NFS4 performance: FreeBSD 9.1 + ZFS + AWS EC2 Message-ID: <74469452.3886197.1373414222081.JavaMail.root@uoguelph.ca> In-Reply-To: <20955.29796.228750.131498@hergotha.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Garrett Wollman wrote: > <<On Mon, 8 Jul 2013 21:43:52 -0400 (EDT), Rick Macklem > <rmacklem@uoguelph.ca> said: > > > Berend de Boer wrote: > >> >>>>> "Rick" == Rick Macklem <rmacklem@uoguelph.ca> writes: > >> > Rick> After you apply the patch and boot the rebuilt kernel, the > Rick> cpu overheads should be reduced after you increase the > >> value > Rick> of vfs.nfsd.tcphighwater. > >> > >> What number would I be looking at? 100? 100,000? > >> > > Garrett Wollman might have more insight into this, but I would say > > on > > the order of 100s to maybe 1000s. > > On my production servers, I'm running with the following tuning > (after Rick's drc4.patch): > > ----loader.conf---- > kern.ipc.nmbclusters="1048576" > vfs.zfs.scrub_limit="16" > vfs.zfs.vdev.max_pending="24" > vfs.zfs.arc_max="48G" > # > # Tunable per mps(4). We had sigificant numbers of allocation > failures > # with the default value of 2048, so bump it up and see whether > there's > # still an issue. > # > hw.mps.max_chains="4096" > # > # Simulate the 10-CURRENT autotuning of maxusers based on available > memory > # > kern.maxusers="8509" > # > # Attempt to make the message buffer big enough to retain all the > crap > # that gets spewed on the console when we boot. 64K (the default) > isn't > # enough to even list all of the disks. > # > kern.msgbufsize="262144" > # > # Tell the TCP implementation to use the specialized, faster but > possibly > # fragile implementation of soreceive. NFS calls soreceive() a lot > and > # using this implementation, if it works, should improve performance > # significantly. > # > net.inet.tcp.soreceive_stream="1" > # > # Six queues per interface means twelve queues total > # on this hardware, which is a good match for the number > # of processor cores we have. > # > hw.ixgbe.num_queues="6" > > ----sysctl.conf---- > # Make sure that device interrupts are not throttled (10GbE can make > # lots and lots of interrupts). > hw.intr_storm_threshold=12000 > > # If the NFS replay cache isn't larger than the number of operations > nfsd > # can perform in a second, the nfsd service threads will spend all of > their > # time contending for the mutex that protects the cache data > structure so > # that they can trim them. If the cache is big enough, it will only > do this > # once a second. > vfs.nfsd.tcpcachetimeo=300 > vfs.nfsd.tcphighwater=150000 > > ----modules/nfs/server/freebsd.pp---- > exec {'sysctl vfs.nfsd.minthreads': > command => "sysctl vfs.nfsd.minthreads=${min_threads}", > onlyif => "test $(sysctl -n vfs.nfsd.minthreads) -ne > ${min_threads}", > require => Service['nfsd'], > } > > exec {'sysctl vfs.nfsd.maxthreads': > command => "sysctl vfs.nfsd.maxthreads=${max_threads}", > onlyif => "test $(sysctl -n vfs.nfsd.maxthreads) -ne > ${max_threads}", > require => Service['nfsd'], > } > > ($min_threads and $max_threads are manually configured based on > hardware, currently 16/64 on 8-core machines and 16/96 on 12-core > machines.) > > As this is the summer, we are currently very lightly loaded. There's > apparently still a bug in drc4.patch, because both of my non-scratch > production servers show a negative CacheSize in nfsstat. > > (I hope that all of these patches will make it into 9.2 so we don't > have to maintain our own mutant NFS implementation.) > Afraid not. I was planning on getting it in, but the release schedule appeared with a short time to code slush. Hopefully a cleaned up version of this will be in 10.0 and 9.3. rick > -GAWollman > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?74469452.3886197.1373414222081.JavaMail.root>