From owner-freebsd-stable@FreeBSD.ORG  Wed Jan 14 08:08:47 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 47DE816A4CE
	for <freebsd-stable@freebsd.org>;
	Wed, 14 Jan 2004 08:08:47 -0800 (PST)
Received: from mutare.noc.clara.net (mutare.noc.clara.net [195.8.70.95])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1663843DA4
	for <freebsd-stable@freebsd.org>;
	Wed, 14 Jan 2004 08:07:50 -0800 (PST)
	(envelope-from ollie@mutare.noc.clara.net)
Received: from ollie by mutare.noc.clara.net with local (Exim 4.24)
	id 1AgnYa-0001ZD-PA; Wed, 14 Jan 2004 16:07:48 +0000
Date: Wed, 14 Jan 2004 16:07:48 +0000
From: Ollie Cook <ollie@uk.clara.net>
To: Doug White <dwhite@gumbysoft.com>
Message-ID: <20040114160748.GJ27744@mutare.noc.clara.net>
References: <20040113154932.GE354@mutare.noc.clara.net>
	<20040113114525.L63732@carver.gumbysoft.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040113114525.L63732@carver.gumbysoft.com>
User-Agent: Mutt/1.4.1i
X-Operating-System: FreeBSD 4.9-STABLE i386
X-NCC-RegID: uk.claranet
Sender: Ollie Cook <ollie@mutare.noc.clara.net>
cc: freebsd-stable@freebsd.org
Subject: Re: nfs send errors 32 and 35 on RELENG_4
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jan 2004 16:08:47 -0000

On Tue, Jan 13, 2004 at 11:50:41AM -0800, Doug White wrote:
> > Jan 13 14:02:02 mese /kernel: nfs server 192.168.1.1:/vol/vol1/claramail: not responding
> > Jan 13 14:02:03 mese /kernel: nfs server 192.168.1.1:/vol/vol1/claramail: is alive again
> 
> There's some tuning options for this, which I don't immediately recall.
> Under heavy load these are somewhat normal.

Hi Doug,

I have set the kernel to auto-scale nmbclusters based on the memory in the host
in question. I think it's not worth hard-coding these values since the peak
value seems not to get near the maximum:

  root@mese:[conf] (10) # netstat -m
  397/1200/34816 mbufs in use (current/peak/max):
          294 mbufs allocated to data
          103 mbufs allocated to packet headers
  215/676/8704 mbuf clusters in use (current/peak/max)
  1652 Kbytes allocated to network (6% of mb_map in use)
  0 requests for memory denied
  0 requests for memory delayed
  0 calls to protocol drain routines

If peak does get close to max, I will increase the number of nmbclusters, but
it doesn't look necessary at present.

All sysctls are at default values.

Are there other things I can be looking at tuning? These hosts do approximately
500 NFS operations each per second (appx 5Mbit/s).

> > Jan 13 14:09:37 mese /kernel: nfs send error 35 for server 192.168.1.1:/vol/vol1/claramail
> > Jan 13 14:09:53 mese /kernel: nfs send error 32 for server 192.168.1.1:/vol/vol1/claramail
> 
> These errors tend to imply resource shortages. Monitor netstat -m output
> and make sure you aren't running out of mbuf or mbuf clusters. Also check
> for network errors and dropped packets (netstat -s, switch statistics).

Prior to power cycling the box I was able to look at netstat -m, and netstat
-i, neither of which showed anything to worry about. Next time it happens I'll
be sure to take a copy of the output, in case there's something I'm missing.

Is there any way of finding out what error 35 actually means?

> Are you running rpc.lockd?

The server is an Network Appliance F-Series filer which runs a locking manager:

root@metis:[conf] (13) # rpcinfo -p 192.168.1.1 | grep lock
    100021    4   tcp    607  nlockmgr
    100021    3   tcp    607  nlockmgr
    100021    1   tcp    607  nlockmgr
    100021    4   udp    606  nlockmgr
    100021    3   udp    606  nlockmgr
    100021    1   udp    606  nlockmgr

Thank you for your help so far.

Cheers,

Ollie

-- 
Oliver Cook    Systems Administrator, Claranet UK
ollie@uk.clara.net               +44 20 7903 3065