From owner-freebsd-questions@FreeBSD.ORG Tue Aug 24 19:12:01 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6FDE110656B0 for ; Tue, 24 Aug 2010 19:12:01 +0000 (UTC) (envelope-from wmoran@potentialtech.com) Received: from mail.potentialtech.com (internet.potentialtech.com [66.167.251.6]) by mx1.freebsd.org (Postfix) with ESMTP id 41A0A8FC22 for ; Tue, 24 Aug 2010 19:11:59 +0000 (UTC) Received: from overdrive.ws.pitbpa0.priv.collaborativefusion.com (pr40.pitbpa0.pub.collaborativefusion.com [206.210.89.202]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.potentialtech.com (Postfix) with ESMTPSA id 56EFCF7427; Tue, 24 Aug 2010 15:11:58 -0400 (EDT) Date: Tue, 24 Aug 2010 15:11:57 -0400 From: Bill Moran To: Lucas Wang Message-Id: <20100824151157.85e8d95e.wmoran@potentialtech.com> In-Reply-To: References: Organization: Bill Moran X-Mailer: Sylpheed 3.0.3 (GTK+ 2.20.1; amd64-portbld-freebsd8.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Freebsd Subject: Re: nfs server /home not responding X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Aug 2010 19:12:01 -0000 In response to Lucas Wang : > > We use NFS to store /home directory for users in our lab. > However, we occasionally get blocked from logging in because > the automount daemon on a NFS client machine hangs. When > that happens, we get this error message on the NFS client machine > called "bucks" in its system logs: > Aug 24 10:53:40 bucks kernel: nfs server pid670@bucks:/home: not responding > > pid670 is the amd process. > > Our NFS server(raptors) has the following configuration: > FreeBSD raptors.cs.ucla.edu 7.3-PRERELEASE FreeBSD 7.3-PRERELEASE #0: Tue Feb 9 12:59:50 PST 2010 root@raptors.cs.ucla.edu:/usr/obj/usr/src/sys/RAPTORS amd64 > > And the client machine is configured as: > FreeBSD bucks.cs.ucla.edu 7.3-PRERELEASE FreeBSD 7.3-PRERELEASE #0: Tue Feb 9 20:47:50 UTC 2010 root@bucks.cs.ucla.edu:/usr/obj/usr/src/sys/BUCKS amd64 > > Another thing I want to add is that several other NFS client machines > also hang from time to time. But they don't usually hang at the same time. > Even though rebooting can fix the problem once, we don't want it keep hurting us. > > So any insights or suggestions will be greatly appreciated. Thanks a lot. Do you have dumbtimer in the options for the nfs mount? My research into this indicated that the NFS client keeps track of average response times from the server. If the server starts to respond significantly slower than is expected, the code assumes that the server is down and the mount freezes and that message appears in the logs. Usually, after a short wait (a few minutes) the connection resumes and you see a "server is alive again message". See man mount_nfs for more info. Also, try switching to TCP mounts. If you have a network that occasionally gets hit with traffic spikes that cause data packets to take abnormally long to travel, or an NFS server that occasionally gets usage spikes that cause it to respond slowly, this will happen. In addition to dumbtimer you can also look at better segmenting your network, or increasing the capacity of the NFS server to prevent the problem. If the NFS hangs occur and the mount never recovers (even after several minutes) then you probably have a different problem. Possibly a firewall is losing the state table and thus the connection is going bad? -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/