From owner-freebsd-fs@FreeBSD.ORG Tue Jul 9 23:57:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1F9BD426 for ; Tue, 9 Jul 2013 23:57:03 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id DBA93179E for ; Tue, 9 Jul 2013 23:57:02 +0000 (UTC) X-Cloudmark-SP-Filtered: true X-Cloudmark-SP-Result: v=1.1 cv=ME3lrcP4jFDzpPiCSQywCMKJiHtpRWeRXBDIYmR1BZg= c=1 sm=2 a=ctSXsGKhotwA:10 a=FKkrIqjQGGEA:10 a=V5z4IuhVU5kA:10 a=IkcTkHD0fZMA:10 a=pkkU0Bg7WzlNbPfhUykA:9 a=QEXdDO2ut3YA:10 a=fi6rhVxsa7yJvVoJ:21 a=BcIZAEyNbUJ55jHW:21 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEADOi3FGDaFve/2dsb2JhbABbhAiDCL4LgSt0giMBAQUjVhsYAgINGQJZBhOID6hykR6BJo4RNAeCVoEeA5QBlRyDLSCBbA X-IronPort-AV: E=Sophos;i="4.87,1031,1363147200"; d="scan'208";a="39048264" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 09 Jul 2013 19:57:02 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 16E06B3FAC; Tue, 9 Jul 2013 19:57:02 -0400 (EDT) Date: Tue, 9 Jul 2013 19:57:02 -0400 (EDT) From: Rick Macklem To: Garrett Wollman Message-ID: <74469452.3886197.1373414222081.JavaMail.root@uoguelph.ca> In-Reply-To: <20955.29796.228750.131498@hergotha.csail.mit.edu> Subject: Re: Terrible NFS4 performance: FreeBSD 9.1 + ZFS + AWS EC2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2013 23:57:03 -0000 Garrett Wollman wrote: > < said: > > > Berend de Boer wrote: > >> >>>>> "Rick" == Rick Macklem writes: > >> > Rick> After you apply the patch and boot the rebuilt kernel, the > Rick> cpu overheads should be reduced after you increase the > >> value > Rick> of vfs.nfsd.tcphighwater. > >> > >> What number would I be looking at? 100? 100,000? > >> > > Garrett Wollman might have more insight into this, but I would say > > on > > the order of 100s to maybe 1000s. > > On my production servers, I'm running with the following tuning > (after Rick's drc4.patch): > > ----loader.conf---- > kern.ipc.nmbclusters="1048576" > vfs.zfs.scrub_limit="16" > vfs.zfs.vdev.max_pending="24" > vfs.zfs.arc_max="48G" > # > # Tunable per mps(4). We had sigificant numbers of allocation > failures > # with the default value of 2048, so bump it up and see whether > there's > # still an issue. > # > hw.mps.max_chains="4096" > # > # Simulate the 10-CURRENT autotuning of maxusers based on available > memory > # > kern.maxusers="8509" > # > # Attempt to make the message buffer big enough to retain all the > crap > # that gets spewed on the console when we boot. 64K (the default) > isn't > # enough to even list all of the disks. > # > kern.msgbufsize="262144" > # > # Tell the TCP implementation to use the specialized, faster but > possibly > # fragile implementation of soreceive. NFS calls soreceive() a lot > and > # using this implementation, if it works, should improve performance > # significantly. > # > net.inet.tcp.soreceive_stream="1" > # > # Six queues per interface means twelve queues total > # on this hardware, which is a good match for the number > # of processor cores we have. > # > hw.ixgbe.num_queues="6" > > ----sysctl.conf---- > # Make sure that device interrupts are not throttled (10GbE can make > # lots and lots of interrupts). > hw.intr_storm_threshold=12000 > > # If the NFS replay cache isn't larger than the number of operations > nfsd > # can perform in a second, the nfsd service threads will spend all of > their > # time contending for the mutex that protects the cache data > structure so > # that they can trim them. If the cache is big enough, it will only > do this > # once a second. > vfs.nfsd.tcpcachetimeo=300 > vfs.nfsd.tcphighwater=150000 > > ----modules/nfs/server/freebsd.pp---- > exec {'sysctl vfs.nfsd.minthreads': > command => "sysctl vfs.nfsd.minthreads=${min_threads}", > onlyif => "test $(sysctl -n vfs.nfsd.minthreads) -ne > ${min_threads}", > require => Service['nfsd'], > } > > exec {'sysctl vfs.nfsd.maxthreads': > command => "sysctl vfs.nfsd.maxthreads=${max_threads}", > onlyif => "test $(sysctl -n vfs.nfsd.maxthreads) -ne > ${max_threads}", > require => Service['nfsd'], > } > > ($min_threads and $max_threads are manually configured based on > hardware, currently 16/64 on 8-core machines and 16/96 on 12-core > machines.) > > As this is the summer, we are currently very lightly loaded. There's > apparently still a bug in drc4.patch, because both of my non-scratch > production servers show a negative CacheSize in nfsstat. > > (I hope that all of these patches will make it into 9.2 so we don't > have to maintain our own mutant NFS implementation.) > Afraid not. I was planning on getting it in, but the release schedule appeared with a short time to code slush. Hopefully a cleaned up version of this will be in 10.0 and 9.3. rick > -GAWollman > >