Date: Mon, 8 Jul 2013 22:24:36 -0400 From: Garrett Wollman <wollman@bimajority.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Terrible NFS4 performance: FreeBSD 9.1 + ZFS + AWS EC2 Message-ID: <20955.29796.228750.131498@hergotha.csail.mit.edu> In-Reply-To: <27783474.3353362.1373334232356.JavaMail.root@uoguelph.ca> References: <87ppuszgth.wl%berend@pobox.com> <27783474.3353362.1373334232356.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
<<On Mon, 8 Jul 2013 21:43:52 -0400 (EDT), Rick Macklem <rmacklem@uoguelph.ca> said: > Berend de Boer wrote: >> >>>>> "Rick" == Rick Macklem <rmacklem@uoguelph.ca> writes: >> Rick> After you apply the patch and boot the rebuilt kernel, the Rick> cpu overheads should be reduced after you increase the >> value Rick> of vfs.nfsd.tcphighwater. >> >> What number would I be looking at? 100? 100,000? >> > Garrett Wollman might have more insight into this, but I would say on > the order of 100s to maybe 1000s. On my production servers, I'm running with the following tuning (after Rick's drc4.patch): ----loader.conf---- kern.ipc.nmbclusters="1048576" vfs.zfs.scrub_limit="16" vfs.zfs.vdev.max_pending="24" vfs.zfs.arc_max="48G" # # Tunable per mps(4). We had sigificant numbers of allocation failures # with the default value of 2048, so bump it up and see whether there's # still an issue. # hw.mps.max_chains="4096" # # Simulate the 10-CURRENT autotuning of maxusers based on available memory # kern.maxusers="8509" # # Attempt to make the message buffer big enough to retain all the crap # that gets spewed on the console when we boot. 64K (the default) isn't # enough to even list all of the disks. # kern.msgbufsize="262144" # # Tell the TCP implementation to use the specialized, faster but possibly # fragile implementation of soreceive. NFS calls soreceive() a lot and # using this implementation, if it works, should improve performance # significantly. # net.inet.tcp.soreceive_stream="1" # # Six queues per interface means twelve queues total # on this hardware, which is a good match for the number # of processor cores we have. # hw.ixgbe.num_queues="6" ----sysctl.conf---- # Make sure that device interrupts are not throttled (10GbE can make # lots and lots of interrupts). hw.intr_storm_threshold=12000 # If the NFS replay cache isn't larger than the number of operations nfsd # can perform in a second, the nfsd service threads will spend all of their # time contending for the mutex that protects the cache data structure so # that they can trim them. If the cache is big enough, it will only do this # once a second. vfs.nfsd.tcpcachetimeo=300 vfs.nfsd.tcphighwater=150000 ----modules/nfs/server/freebsd.pp---- exec {'sysctl vfs.nfsd.minthreads': command => "sysctl vfs.nfsd.minthreads=${min_threads}", onlyif => "test $(sysctl -n vfs.nfsd.minthreads) -ne ${min_threads}", require => Service['nfsd'], } exec {'sysctl vfs.nfsd.maxthreads': command => "sysctl vfs.nfsd.maxthreads=${max_threads}", onlyif => "test $(sysctl -n vfs.nfsd.maxthreads) -ne ${max_threads}", require => Service['nfsd'], } ($min_threads and $max_threads are manually configured based on hardware, currently 16/64 on 8-core machines and 16/96 on 12-core machines.) As this is the summer, we are currently very lightly loaded. There's apparently still a bug in drc4.patch, because both of my non-scratch production servers show a negative CacheSize in nfsstat. (I hope that all of these patches will make it into 9.2 so we don't have to maintain our own mutant NFS implementation.) -GAWollman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20955.29796.228750.131498>