From owner-freebsd-stable@FreeBSD.ORG  Fri Jul 25 22:54:04 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C9CC934C
 for <freebsd-stable@freebsd.org>; Fri, 25 Jul 2014 22:54:04 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 902932764
 for <freebsd-stable@freebsd.org>; Fri, 25 Jul 2014 22:54:04 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqIEAOPe0lODaFve/2dsb2JhbABZhDuCdM0JgxcBgSV3hAMBAQQBI0IUBRYOCgICDRkCWQaITQinR5cWF4EsjVYVATMHgniBUQWOOZkAiFiDZCGBM0E
X-IronPort-AV: E=Sophos;i="5.01,733,1400040000"; d="scan'208";a="143685607"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 25 Jul 2014 18:54:03 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 46D87B40A2;
 Fri, 25 Jul 2014 18:54:03 -0400 (EDT)
Date: Fri, 25 Jul 2014 18:54:03 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
Message-ID: <666117227.3741955.1406328843281.JavaMail.root@uoguelph.ca>
In-Reply-To: <53D23B57.8020208@omnilan.de>
Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926)
Cc: freebsd-stable <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Jul 2014 22:54:04 -0000

Harald Schmalzbauer wrote:
> Bez=C3=BCglich Rick Macklem's Nachricht vom 25.07.2014 12:38 (localtime):
> > Harald Schmalzbauer wrote:
> >> Bez=C3=BCglich Rick Macklem's Nachricht vom 25.07.2014 02:14
> >> (localtime):
> >>> Harald Schmalzbauer wrote:
> >>>> Bez=C3=BCglich Rick Macklem's Nachricht vom 08.08.2013 14:20
> >>>> (localtime):
> >>>>> Lars Eggert wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> every few days or so, my -STABLE NFS server (v3 and v4) gets
> >>>>>> wedged
> >>>>>> with a ton of messages about "nfsd server cache flooded, try
> >>>>>> to
> >>>>>> increase nfsrc_floodlevel" in the log, and nfsstat shows
> >>>>>> TCPPeak
> >>>>>> at
> >>>>>> 16385. It requires a reboot to unwedge, restarting the server
> >>>>>> does
> >>>>>> not help.
> >>>>>>
> >>>>>> The clients are (mostly) six -CURRENT nfsv4 boxes that netboot
> >>>>>> from
> >>>>>> the server and mount all drives from there.
> >>>>>>
> >>> Have you tried increasing vfs.nfsd.tcphighwater?
> >>> This needs to be increased to increase the flood level above
> >>> 16384.
> >>>
> >>> Garrett Wollman sets:
> >>> vfs.nfsd.tcphighwater=3D100000
> >>> vfs.nfsd.tcpcachetimeo=3D300
> >>>
> >>> or something like that, if I recall correctly.
> >> Thanks you for your help!
> >>
> >> I read about tuning these sysctls, but I object individually
> >> altering
> >> these, because I don't have hundreds of clients torturing a poor
> >> server
> >> or any other not well balanced setup.
> >> I run into this problem with one client, connected via 1GbE (not
> >> 10
> >> or
> >> 40GbE) link, talking to modern server with 10G RAM - and this
> >> environment forces me to reboot the storage server every 2nd day.
> >> IMHO such a setup shouldn't require manual tuning and I consider
> >> this
> >> as
> >> a really urgent problem!
> > Btw, what you can do to help with this is experiment with the
> > tunable
> > and if you find a setting that works well for your server, report
> > that
> > back as a data point that can be used for this.
> >
> > If you make it too large, the server runs out of address space that
> > can be used by malloc() and that results in the whole machine being
> > wedged and not just the NFS server.
>=20
> I'd happily provide experience results, but I see my environment (the
> only one I reintroduced nfs atm.) as uncommon, because few LANs out
> there have NFS services with just two clientes, where only one does
> really use nfs.
> So before tuning sysctls in other production environments than my own
> (small and uncommon) setup, I need to be prooven that nfs is usable
> these days (v4). If the noopen.patch prooves to be one possibility to
> stabilize things, I'll be able to find out optimized settings of
> vfs.nfsd.tcp*.
> Then I could have the patched kernel in addation, which I need to be
> able to ensure reliable service.
Note that, for this situation it isn't the number of clients, but the
number of different processes on the client(s) that is the issue.
For example:
Client type A - Has a long running process that opens/closes many files.
  This one will only result in one OpenOwner and probably wouldn't exceed
  16K OpenOwners, even for 1000s of clients.
Client type B - Parent process forks a lot of children who open files.
  This one results in an OpenOwner for each of the forked child processes
  and even a single client of this type could exceed 16K OpenOwners.

Each process on a client that opens a file results in an OpenOwner.
Unfortunately, for NFSv4.0, there is no well defined time when an OpenOwner
can be safely deleted. It must be retained until the related Opens are clos=
ed,
but retaining it longer helps w.r.t. correct behaviour if a networking part=
ition
occurs. As such, I made the default timeout for OenOwners very conservative
at 12hrs. You definitely want to decrease this (vfs.nfsd.tcpcachetimeo), po=
ssibly
as low as 300sec.
After this change, I suspect your single client case will be resolved and y=
ou
can see what the peek value for OpenOwners is for one client. Then multiply
that by the number of clients you wish to support and set
vfs.nfsd.tcphighwater to that value.

The key piece of information to help figure out how to tune this dynamicall=
y
is the server machine's memory and arch (32 vs 64bit) plus a value for
vfs.nfsd.tcphighwater that it runs stably at. (And the peek value of OpenOw=
ner
that you see while in operation.)

The problem with setting vfs.nfsd.tcphighwater too large is that an overloa=
ded
server can run out of kernel address space/memory it can use for malloc() a=
nd
then the whole machine wedges. (I can only test to see what is safe for a
256Mbyte i386.)

If you can't get tunable values of the above that work, you can set
vfs.nfsd.cachetcp=3D0
and then this problem goes away, although the result is a level of
correctness for non-idempotent operations that is the same as the old
NFS server (which didn't try to use the DRC for TCP).

All of this is fixed by a component of NFSv4.1 called sessions.

rick

> Additinally I should first read somhere what they are doing to get
> the
> right understanding=E2=80=A6
>=20
> Thanks,
>=20
> -Harry
>=20
> P.S.: I'd happily donate some used GbE switch+server if that helps!
>=20
>=20