From owner-freebsd-questions@FreeBSD.ORG Tue Oct 5 12:51:07 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AD47416A4CE for ; Tue, 5 Oct 2004 12:51:07 +0000 (GMT) Received: from internet.potentialtech.com (h-66-167-251-6.phlapafg.covad.net [66.167.251.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1370443D55 for ; Tue, 5 Oct 2004 12:51:05 +0000 (GMT) (envelope-from wmoran@potentialtech.com) Received: from working.potentialtech.com (pa-plum-cmts1e-68-68-113-64.pittpa.adelphia.net [68.68.113.64]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by internet.potentialtech.com (Postfix) with ESMTP id 10DBF69A39; Tue, 5 Oct 2004 08:51:03 -0400 (EDT) Date: Tue, 5 Oct 2004 08:51:02 -0400 From: Bill Moran To: Alex de Kruijff Message-Id: <20041005085102.376a7e95.wmoran@potentialtech.com> In-Reply-To: <20041005052249.GC917@alex.lan> References: <20041004001747.J10913@ganymede.hub.org> <20041005052249.GC917@alex.lan> Organization: Potential Technologies X-Mailer: Sylpheed version 0.9.12 (GTK+ 1.2.10; i386-portbld-freebsd4.9) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit cc: freebsd-questions@freebsd.org Subject: Re: nfs server not responding / is alive again X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Oct 2004 12:51:07 -0000 Alex de Kruijff wrote: > On Mon, Oct 04, 2004 at 12:22:30AM -0300, Marc G. Fournier wrote: > > > > I'm using an nfs mount to get at the underlying file system on a system > > that uses unionfs mounts ... instead of using nullfs, which, last time I > > used it over a year ago, caused the server to crash to no end ... > > > > But, as soon as there is any 'load', I'm getting a whack of: > > > > Oct 3 22:46:16 neptune /kernel: nfs server neptune.hub.org:/vm: not > > responding > > Oct 3 22:46:16 neptune /kernel: nfs server neptune.hub.org:/vm: is alive > > again > > Oct 3 22:48:30 neptune /kernel: nfs server neptune.hub.org:/vm: not > > responding > > Oct 3 22:48:30 neptune /kernel: nfs server neptune.hub.org:/vm: is alive > > again In my experience, this is caused by the server responding unpredictably. Someone smarter than me may correct me, but I believe the nfs client keeps track of how quickly the NFS server responds, and uses it to judge whether the server is still working or not. Any time the server's response time varies too much from that amount, the client will assume the server is down, but if the server is not down, you'll see the "is alive" message immediately after. Basically, during normal usage, the server is responding very quickly, so the client assumes it will always respond that fast. Then, under heavy load, the slower response makes the client a little paranoid. I've seen this when running NFS over WiFi, where the ping times are usually not consistent. One thing is to just ignore the messages and accept that this is a natural side effect of high loads. Another would be to use TCP mounts instead of UDP mounts, which don't have this trouble. What kind of network topology is between the two machines? Do you notice a high load on the hub/switch/routers during these activities? You may be able to improve the intervening network topology to improve the problem as well. > > > > in /var/log/messages ... > > > > I'm running nfsd with the standard flags: > > > > nfs_server_flags="-u -t -n 4" > > > > Is there something that I can do to reduce this problem? increase number > > of nfsd processes? force a tcp connection? > > You could try giving the nfsd processes more priority as root with > rtprio. If the file /var/run/nfsd.pid exist then you could try something > like: rtprio 10 -`cat /var/run/nfds.pid`. > > You could also try giving the other porcesses less priority like > nice -n 2 rsync. But i'm am not show how this works at the other end. > > > The issue is more prevalent when I have >4 processes trying to read from > > the nfs mounts ... should there be one mount per process? the process(es) > > in question are rsync, if that helps ... they tend to be a bit more 'disk > > intensive' then most processes, which is why I thought of increasing -n > > ... Might help. I would look at networking before I looked at disk usage ... are there dropped packets and the like. But it could be either. -- Bill Moran Potential Technologies http://www.potentialtech.com