From owner-freebsd-questions@FreeBSD.ORG  Tue Oct  5 12:51:07 2004
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AD47416A4CE
	for <freebsd-questions@freebsd.org>;
	Tue,  5 Oct 2004 12:51:07 +0000 (GMT)
Received: from internet.potentialtech.com (h-66-167-251-6.phlapafg.covad.net
	[66.167.251.6])	by mx1.FreeBSD.org (Postfix) with ESMTP id 1370443D55
	for <freebsd-questions@freebsd.org>;
	Tue,  5 Oct 2004 12:51:05 +0000 (GMT)
	(envelope-from wmoran@potentialtech.com)
Received: from working.potentialtech.com
	(pa-plum-cmts1e-68-68-113-64.pittpa.adelphia.net [68.68.113.64])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by internet.potentialtech.com (Postfix) with ESMTP id 10DBF69A39;
	Tue,  5 Oct 2004 08:51:03 -0400 (EDT)
Date: Tue, 5 Oct 2004 08:51:02 -0400
From: Bill Moran <wmoran@potentialtech.com>
To: Alex de Kruijff <freebsd@akruijff.dds.nl>
Message-Id: <20041005085102.376a7e95.wmoran@potentialtech.com>
In-Reply-To: <20041005052249.GC917@alex.lan>
References: <20041004001747.J10913@ganymede.hub.org>
	<20041005052249.GC917@alex.lan>
Organization: Potential Technologies
X-Mailer: Sylpheed version 0.9.12 (GTK+ 1.2.10; i386-portbld-freebsd4.9)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
cc: freebsd-questions@freebsd.org
Subject: Re: nfs server not responding / is alive again
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Oct 2004 12:51:07 -0000

Alex de Kruijff <freebsd@akruijff.dds.nl> wrote:
> On Mon, Oct 04, 2004 at 12:22:30AM -0300, Marc G. Fournier wrote:
> > 
> > I'm using an nfs mount to get at the underlying file system on a system 
> > that uses unionfs mounts ... instead of using nullfs, which, last time I 
> > used it over a year ago, caused the server to crash to no end ...
> > 
> > But, as soon as there is any 'load', I'm getting a whack of:
> > 
> > Oct  3 22:46:16 neptune /kernel: nfs server neptune.hub.org:/vm: not 
> > responding
> > Oct  3 22:46:16 neptune /kernel: nfs server neptune.hub.org:/vm: is alive 
> > again
> > Oct  3 22:48:30 neptune /kernel: nfs server neptune.hub.org:/vm: not 
> > responding
> > Oct  3 22:48:30 neptune /kernel: nfs server neptune.hub.org:/vm: is alive 
> > again

In my experience, this is caused by the server responding unpredictably.

Someone smarter than me may correct me, but I believe the nfs client keeps
track of how quickly the NFS server responds, and uses it to judge whether
the server is still working or not.  Any time the server's response time
varies too much from that amount, the client will assume the server is
down, but if the server is not down, you'll see the "is alive" message
immediately after.  Basically, during normal usage, the server is
responding very quickly, so the client assumes it will always respond
that fast.  Then, under heavy load, the slower response makes the client
a little paranoid.

I've seen this when running NFS over WiFi, where the ping times are
usually not consistent.

One thing is to just ignore the messages and accept that this is a
natural side effect of high loads.  Another would be to use TCP mounts
instead of UDP mounts, which don't have this trouble.

What kind of network topology is between the two machines?  Do you notice
a high load on the hub/switch/routers during these activities?  You may
be able to improve the intervening network topology to improve the
problem as well.

> > 
> > in /var/log/messages ...
> > 
> > I'm running nfsd with the standard flags:
> > 
> > 	nfs_server_flags="-u -t -n 4"
> > 
> > Is there something that I can do to reduce this problem?  increase number 
> > of nfsd processes?  force a tcp connection?
> 
> You could try giving the nfsd processes more priority as root with
> rtprio. If the file /var/run/nfsd.pid exist then you could try something
> like: rtprio 10 -`cat /var/run/nfds.pid`.
> 
> You could also try giving the other porcesses less priority like
> nice -n 2 rsync. But i'm am not show how this works at the other end. 
> 
> > The issue is more prevalent when I have >4 processes trying to read from 
> > the nfs mounts ... should there be one mount per process?  the process(es) 
> > in question are rsync, if that helps ... they tend to be a bit more 'disk 
> > intensive' then most processes, which is why I thought of increasing -n 
> > ...

Might help.  I would look at networking before I looked at disk usage ...
are there dropped packets and the like.  But it could be either.

<snip>


-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com