From owner-freebsd-stable@FreeBSD.ORG Sat Jul 26 21:54:13 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EECCAA1F for ; Sat, 26 Jul 2014 21:54:13 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id B42F1255A for ; Sat, 26 Jul 2014 21:54:13 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAA8j1FODaFve/2dsb2JhbABRCIQ7gnTNF4MXAYEgd4QDAQEEASNWBRYOCgICDRkCWQaITQimI5ZpF4EsjUoiNAeCeYFRBZcqkBSIWoNlIYF0 X-IronPort-AV: E=Sophos;i="5.01,737,1400040000"; d="scan'208";a="144633780" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 26 Jul 2014 17:54:02 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 678A9B4062; Sat, 26 Jul 2014 17:54:02 -0400 (EDT) Date: Sat, 26 Jul 2014 17:54:02 -0400 (EDT) From: Rick Macklem To: Harald Schmalzbauer Message-ID: <1626176481.3920822.1406411642415.JavaMail.root@uoguelph.ca> In-Reply-To: <53D37D08.2000104@omnilan.de> Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jul 2014 21:54:14 -0000 Harald Schmalzbauer wrote: > Bez=C3=BCglich Rick Macklem's Nachricht vom 25.07.2014 13:14 (localtime): > > Harald Schmalzbauer wrote: > >> Bez=C3=BCglich Rick Macklem's Nachricht vom 25.07.2014 02:14 > >> (localtime): > >>> Harald Schmalzbauer wrote: > >>>> Bez=C3=BCglich Rick Macklem's Nachricht vom 08.08.2013 14:20 > >>>> (localtime): > >>>>> Lars Eggert wrote: > >>>>>> Hi, > >>>>>> > >>>>>> every few days or so, my -STABLE NFS server (v3 and v4) gets > >>>>>> wedged > >>>>>> with a ton of messages about "nfsd server cache flooded, try > >>>>>> to > >>>>>> increase nfsrc_floodlevel" in the log, and nfsstat shows > >>>>>> TCPPeak > >>>>>> at > >>>>>> 16385. It requires a reboot to unwedge, restarting the server > >>>>>> does > >>>>>> not help. > >>>>>> > >>>>>> =E2=80=A6 > >> IMHO such a setup shouldn't require manual tuning and I consider > >> this > >> as > >> a really urgent problem! > >> Whatever causes the server to lock up is strongly required to be > >> fixed > >> for next release, > >> otherwise the shipped implementation of NFS is not really suitable > >> for > >> production environment and needs a warning message when enabled. > >> The impact of this failure forces admins to change the operation > >> system > >> in order to get a core service back into operation. > >> The importance is, that I don't suffer from weaker performance or > >> lags/delays, but my server stops NFS completely and only a reboot > >> solves > >> this situation. > >> > > Btw, you can increase vfs.nfsd.tcphighwater on the fly when it > > wedges > > and avoid having to reboot. > One suggestion: If raising vfs.nfsd.tcphighwater at runtime solves > the > locked nfsserver (which I thought I have tried, but I'm not sure any > more), maybe the log message should reflect that. My first guess was > to > look for a systcl named 'nfsrc_floodlevel'. If the log message makes > more sense to contain nfsrc_floodlevel instead of tcphighwater, the > latter should be metioned in the man page anyway. >=20 Yes, I'll admit I had intended to do that, but it slipped through the cracks. I will also try and make sure that bumping it up will allow the server to get working again without a reboot. > If the problem is solvable without rebooting, it's a cosmetic problem > IMHO, not that serious show stopper I considered at first. > Evereything > but a reboot is fine ;-) >=20 I would like to come up with a way to tune this based on server size (RAM + 32 vs 64bit arch or similar), but it may take a while to gather enough knowledge to do so. rick > Thanks, >=20 > -Harry >=20 >=20 >=20 >=20 >=20