From owner-freebsd-fs@FreeBSD.ORG Tue Jul 9 02:24:39 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 37CBDA99 for ; Tue, 9 Jul 2013 02:24:39 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id D1CC61DEF for ; Tue, 9 Jul 2013 02:24:38 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id r692OakD018309; Mon, 8 Jul 2013 22:24:36 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id r692OaHZ018306; Mon, 8 Jul 2013 22:24:36 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20955.29796.228750.131498@hergotha.csail.mit.edu> Date: Mon, 8 Jul 2013 22:24:36 -0400 From: Garrett Wollman To: Rick Macklem Subject: Re: Terrible NFS4 performance: FreeBSD 9.1 + ZFS + AWS EC2 In-Reply-To: <27783474.3353362.1373334232356.JavaMail.root@uoguelph.ca> References: <87ppuszgth.wl%berend@pobox.com> <27783474.3353362.1373334232356.JavaMail.root@uoguelph.ca> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (hergotha.csail.mit.edu [127.0.0.1]); Mon, 08 Jul 2013 22:24:36 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2013 02:24:39 -0000 < said: > Berend de Boer wrote: >> >>>>> "Rick" == Rick Macklem writes: >> Rick> After you apply the patch and boot the rebuilt kernel, the Rick> cpu overheads should be reduced after you increase the >> value Rick> of vfs.nfsd.tcphighwater. >> >> What number would I be looking at? 100? 100,000? >> > Garrett Wollman might have more insight into this, but I would say on > the order of 100s to maybe 1000s. On my production servers, I'm running with the following tuning (after Rick's drc4.patch): ----loader.conf---- kern.ipc.nmbclusters="1048576" vfs.zfs.scrub_limit="16" vfs.zfs.vdev.max_pending="24" vfs.zfs.arc_max="48G" # # Tunable per mps(4). We had sigificant numbers of allocation failures # with the default value of 2048, so bump it up and see whether there's # still an issue. # hw.mps.max_chains="4096" # # Simulate the 10-CURRENT autotuning of maxusers based on available memory # kern.maxusers="8509" # # Attempt to make the message buffer big enough to retain all the crap # that gets spewed on the console when we boot. 64K (the default) isn't # enough to even list all of the disks. # kern.msgbufsize="262144" # # Tell the TCP implementation to use the specialized, faster but possibly # fragile implementation of soreceive. NFS calls soreceive() a lot and # using this implementation, if it works, should improve performance # significantly. # net.inet.tcp.soreceive_stream="1" # # Six queues per interface means twelve queues total # on this hardware, which is a good match for the number # of processor cores we have. # hw.ixgbe.num_queues="6" ----sysctl.conf---- # Make sure that device interrupts are not throttled (10GbE can make # lots and lots of interrupts). hw.intr_storm_threshold=12000 # If the NFS replay cache isn't larger than the number of operations nfsd # can perform in a second, the nfsd service threads will spend all of their # time contending for the mutex that protects the cache data structure so # that they can trim them. If the cache is big enough, it will only do this # once a second. vfs.nfsd.tcpcachetimeo=300 vfs.nfsd.tcphighwater=150000 ----modules/nfs/server/freebsd.pp---- exec {'sysctl vfs.nfsd.minthreads': command => "sysctl vfs.nfsd.minthreads=${min_threads}", onlyif => "test $(sysctl -n vfs.nfsd.minthreads) -ne ${min_threads}", require => Service['nfsd'], } exec {'sysctl vfs.nfsd.maxthreads': command => "sysctl vfs.nfsd.maxthreads=${max_threads}", onlyif => "test $(sysctl -n vfs.nfsd.maxthreads) -ne ${max_threads}", require => Service['nfsd'], } ($min_threads and $max_threads are manually configured based on hardware, currently 16/64 on 8-core machines and 16/96 on 12-core machines.) As this is the summer, we are currently very lightly loaded. There's apparently still a bug in drc4.patch, because both of my non-scratch production servers show a negative CacheSize in nfsstat. (I hope that all of these patches will make it into 9.2 so we don't have to maintain our own mutant NFS implementation.) -GAWollman