From owner-freebsd-fs@FreeBSD.ORG  Tue Jul  9 23:53:55 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id CBE1125C
 for <freebsd-fs@freebsd.org>; Tue,  9 Jul 2013 23:53:55 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 93F121761
 for <freebsd-fs@freebsd.org>; Tue,  9 Jul 2013 23:53:55 +0000 (UTC)
X-Cloudmark-SP-Filtered: true
X-Cloudmark-SP-Result: v=1.1 cv=u+Bwc9JL7tMNtl/i9xObSTPSFclN5AOtXcIZY5dPsHA=
 c=1 sm=2
 a=ctSXsGKhotwA:10 a=FKkrIqjQGGEA:10 a=V5z4IuhVU5kA:10 a=IkcTkHD0fZMA:10
 a=GzJd4s-eAAAA:8 a=wAD_S4EvPvu0p8q3YE0A:9 a=QEXdDO2ut3YA:10
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqIEAPGh3FGDaFve/2dsb2JhbABbgztNgwi+C4ErdIIjAQEEASMEUgUWGAICDRkCWQYTiAkGqHKRGgSBJo0DgQ40B4JWgR4DqR2DLSCBNTc
X-IronPort-AV: E=Sophos;i="4.87,1031,1363147200"; d="scan'208";a="39663632"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 09 Jul 2013 19:53:54 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 9A184B405E;
 Tue,  9 Jul 2013 19:53:54 -0400 (EDT)
Date: Tue, 9 Jul 2013 19:53:54 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Berend de Boer <berend@pobox.com>
Message-ID: <1212484294.3883594.1373414034622.JavaMail.root@uoguelph.ca>
In-Reply-To: <87a9lwyy16.wl%berend@pobox.com>
Subject: Re: Terrible NFS4 performance: FreeBSD 9.1 + ZFS + AWS EC2
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-fs <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jul 2013 23:53:56 -0000

Berend de Boer wrote:
> >>>>> "Rick" == Rick Macklem <rmacklem@uoguelph.ca> writes:
> 
>     Rick> After you
>     Rick> apply the patch and boot the rebuilt kernel, the cpu
>     Rick> overheads should be reduced after you increase the value of
>     Rick> vfs.nfsd.tcphighwater.
> 
> OK, completely disregard my previous email. I actually was testing
> against a server in a different data centre, didn't think it would
> matter too much, but clearly it does (ping times 2-3 times higher).
> 
> So moved server + disks into the same data centre as the nfs client.
> 
> 1. Does not effect nfs3.
> 
> 2. When I do not set vfs.nfsd.tcphighwater, I get a "Remote I/O
> error"
>    on the client. On server I see:
> 
>      nfsd server cache flooded, try to increase nfsrc_floodlevel
> 
>    (this just FYI).
> 
> 3. With vfs.nfsd.tcphighwater set to 150,000. I get very high cpu,
> 50%.
> 
The patch I sent you does not tune nfsrc_floodlevel based on what
you set vfs.nfsd.tcphighwater to. That needs to be added to the patch.
(I had some code that did this, but others recommended that it should
 be done as a part of the sysctl, but I haven't gotten around to coding that.)

--> For things to work ok, vfs.nfsd.tcphighwater needs to be less than
    nfsrc_floodlevel (which is 16384).
*** Again, I'd recommend setting vfs.nfsd.tcphighwater=5000 to 10000, which
    is well under 16384 and for which a hash table size of 500 should be ok.

Believe it or not, this server was developed about 10 years ago on a PIII
with 32 (no, not Gbytes, but Mbytes) of RAM. The sizing worked well for that
hardware, but is obviously a bit small for newer hardware;-)

>    Performance is now about 8m15s. Which is better, but still twice
>    above a lower spec Linux NFS4 server, and four times slower than
>    nfs3 on the same box.
> 
> 4. With Garrett's settings, I looked at when the cpu starts to
>    increase. It starts slow, but raises quickly to 50% in about 1
>    minute.
> 
I think his code uses a nfsrc_floodlevel tuned based on vfs.nfsd.tcphighwater
and I suspect a much larger hash table size, too.

>    Time was similar 7m54s.
> 
One other thing you can try is enabling delegations.
On the server:
vfs.nfsd.issue_delegations=1

> 5. I lowered vfs.nfsd.tcphighwater to 10,000 but then it actually
>    became worse, cpu quickly went to 70%, i.e. not much difference
>    with FreeBSD without patch. Didn't keep this test running to see
>    if
>    it became slower over time.
> 
>    Making it 300,000 seems that the cpu increases are slower (but it
>    keeps rising).
> 
>    So from what I observe from the patch is that it makes the rise in
>    cpu increase slower, but doesn't stop it. I.e. after a few
>    minutes,
>    even with setting 300,000 the cpu is getting to 50%, but dropped a
>    bit after a while to hover around 40%. Then it crept back to over
>    50%.
> 
> 6. So the conclusion is: this patch helps somewhat, but nfs4
> behaviour
>    is still majorly impaired compared to nfs3.
> 
Well, reading and writing is the same for NFSv4 as NFSv3, except there isn't
any file handle affinity support for NFSv4 (ties a set of nfsd thread(s) to
reading/writing of a file). File handle affinity results in a more sequential
series of VOP_READ()/VOP_WRITE() calls to the server file system.

The other big difference between NFSv3 and NFSv4 are the Open operations. The
only way to reduce the # of these done may be enabling delegations. How much
effect this has depends on the client.

rick

> 
> --
> All the best,
> 
> Berend de Boer
> 
> 
>           ------------------------------------------------------
>           Awesome Drupal hosting: https://www.xplainhosting.com/
> 
>