Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Aug 2010 15:11:57 -0400
From:      Bill Moran <wmoran@potentialtech.com>
To:        Lucas Wang <lwang@us.toyota-itc.com>
Cc:        Freebsd <freebsd-questions@freebsd.org>
Subject:   Re: nfs server /home not responding
Message-ID:  <20100824151157.85e8d95e.wmoran@potentialtech.com>
In-Reply-To: <DD640CC6-05F1-41DA-B64A-F8C1962C1F63@us.toyota-itc.com>
References:  <DD640CC6-05F1-41DA-B64A-F8C1962C1F63@us.toyota-itc.com>

next in thread | previous in thread | raw e-mail | index | archive | help
In response to Lucas Wang <lwang@us.toyota-itc.com>:
> 
> We use NFS to store /home directory for users in our lab.
> However, we occasionally get blocked from logging in because 
> the automount daemon on a NFS client machine hangs. When
> that happens, we get this error message on the NFS client machine
> called "bucks" in its system logs:
> Aug 24 10:53:40 bucks kernel: nfs server pid670@bucks:/home: not responding
> 
> pid670 is the amd process.
> 
> Our NFS server(raptors) has the following configuration:
> FreeBSD raptors.cs.ucla.edu 7.3-PRERELEASE FreeBSD 7.3-PRERELEASE #0: Tue Feb  9 12:59:50 PST 2010     root@raptors.cs.ucla.edu:/usr/obj/usr/src/sys/RAPTORS  amd64
> 
> And the client machine is configured as:
> FreeBSD bucks.cs.ucla.edu 7.3-PRERELEASE FreeBSD 7.3-PRERELEASE #0: Tue Feb  9 20:47:50 UTC 2010     root@bucks.cs.ucla.edu:/usr/obj/usr/src/sys/BUCKS  amd64
> 
> Another thing I want to add is that several other NFS client machines
> also hang from time to time. But they don't usually hang at the same time.
> Even though rebooting can fix the problem once, we don't want it keep hurting us.
> 
> So any insights or suggestions will be greatly appreciated. Thanks a lot.

Do you have dumbtimer in the options for the nfs mount?

My research into this indicated that the NFS client keeps track of average
response times from the server.  If the server starts to respond significantly
slower than is expected, the code assumes that the server is down and the
mount freezes and that message appears in the logs.  Usually, after a
short wait (a few minutes) the connection resumes and you see a "server
is alive again message".  See man mount_nfs for more info.  Also, try
switching to TCP mounts.

If you have a network that occasionally gets hit with traffic spikes that
cause data packets to take abnormally long to travel, or an NFS server that
occasionally gets usage spikes that cause it to respond slowly, this will
happen.

In addition to dumbtimer you can also look at better segmenting your
network, or increasing the capacity of the NFS server to prevent the
problem.

If the NFS hangs occur and the mount never recovers (even after several
minutes) then you probably have a different problem.  Possibly a firewall
is losing the state table and thus the connection is going bad?

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100824151157.85e8d95e.wmoran>