From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 25 21:58:36 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D9EA0106566C
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 21:58:36 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 949228FC14
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 21:58:36 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap4EADPmLU6DaFvO/2dsb2JhbAA0AQEEASlPDQUYGAICDSUCFlEHhG2jfIh8r2qRFoErhAWBDwSScIgxiEs
X-IronPort-AV: E=Sophos;i="4.67,265,1309752000"; d="scan'208";a="128529439"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 25 Jul 2011 17:58:35 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id ECC5DB3F3C;
	Mon, 25 Jul 2011 17:58:35 -0400 (EDT)
Date: Mon, 25 Jul 2011 17:58:35 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Zack Kirsch <zack.kirsch@isilon.com>
Message-ID: <957583241.989932.1311631115955.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <476FC2247D6C7843A4814ED64344560C04443EAA@seaxch10.desktop.isilon.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2011 21:58:36 -0000

Zack Kirsch wrote:
> Just wanted to add a bit of Isilon color. We've hit this limit before,
> but I believe it was mostly due to strange client behavior of 1) Using
> a new lockowner for each lock and 2) Using a new TCP connection for
> each 'test run'.
When I saw this before, I remarked that this shouldn't be relevant. I
realize now that you were referring to a test environment (not a real
NFS client) where it keeps creating new TCP connections, even if the
previous connection wasn't broken due to a network partitioning or similar.

Sorry about that.

> As far as I know, we haven't hit this in the field.
> 
It appears that this case was a result of use of an old Linux NFSv4 client
and was resolved via a kernel upgrade. (ie. I suspect there are others out
there that will run into the same thing sooner or later.)

> We've done a few things to combat this problem:
> 1) We increased the floodlevel to 65536.
> 2) We made the floodlevel configurable via sysctl.
> 3) We made significant changes to the replay cache itself. Specific
> gains were drastic performance improvements and freeing of cache
> entries from stale TCP connections.
> 
It is important to note that the request cache holds onto replies for
inactive TCP connections because it assumes that the client might be
network partitioned for long enough that it is forced to reconnect using
a fresh TCP connection and will then retry all outstanding RPCs. This
could take a looonnngggg time to happen, so these replies can't be free'd
quickly, or the whole purpose of the cache (avoiding redoing non-idempotent
operations when an RPC is retried) is defeated.

The fact that some artificial test program (pynfs maybe?) chooses to do
fresh TCP connections isn't relevant imho, since it isn't a real client
and, as far as I know, real clients only reconnect when the old TCP connection
no longer works.

I thought I'd try and clarify this for anyone interested, rick