Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Aug 2003 19:33:42 +0200
From:      Alexander Leidinger <Alexander@Leidinger.net>
To:        freebsd-current@freebsd.org
Cc:        rwatson@freebsd.org
Subject:   Re: nfs tranfers hang in state getblck or nfsread
Message-ID:  <20030828193342.3cb6a927.Alexander@Leidinger.net>
In-Reply-To: <Pine.NEB.3.96L.1030828084515.34202C-100000@fledge.watson.org>
References:  <3F4CD409.5080703@telia.com> <Pine.NEB.3.96L.1030828084515.34202C-100000@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 28 Aug 2003 08:54:07 -0400 (EDT)
Robert Watson <rwatson@freebsd.org> wrote:

> Ok, so let me see if I have the sequence of events straight:
> 
> (1) Boot a 4.8-RELEASE/STABLE NFS server
> (2) Boot a 5.1-RELEASE/CURRENT NFS client
> (3) Mount a file system using TCP NFSv3
> (4) Reboot the client system, reboot, and remount
> (5) Thrash the file system a bit with large reads/writes, and it hangs
> 
> Is this correct?  I'd like to work out the minimum sequence of events
> necessary to cause the problem.  Is (4) necessary to reproduce the hang,
> or can you cause it without (4) if you wait long enough?  You mention a

As my server "never" shuts down and the 5-current client is switched off
in the night, I don't know about (4), but I don't think it's necessary
(on a shutdown the filesystems get umounted and /var/db/mountdtab only
show one mount for the client).

> server reboot here, also, so I want to make sure I'm not confused about
> the steps to hit the problem.

In my case there's no server reboot.

> Once the hang is occuring on the client, can you drop into DDB and do a
> ps, and in particular, paste into an e-mail any lines about nfsiod
> threads, and any threads that are blocked in nfs?

Normally I don't notice that it is blocked, as you see in the following,
it may also be the case, that the server is alive again in the same
second:
---snip---
/var/log/messages.0.bz2:Aug 24 11:52:05 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding
/var/log/messages.0.bz2:Aug 24 11:52:27 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again
/var/log/messages.0.bz2:Aug 24 11:52:28 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding
/var/log/messages.0.bz2:Aug 24 11:52:36 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again
/var/log/messages.0.bz2:Aug 24 11:52:46 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding
/var/log/messages.0.bz2:Aug 24 11:52:46 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding
/var/log/messages.0.bz2:Aug 24 11:52:46 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again
/var/log/messages.0.bz2:Aug 24 11:52:46 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again
/var/log/messages.0.bz2:Aug 24 11:53:13 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding
/var/log/messages.0.bz2:Aug 24 11:53:58 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again
---snip---

> For kicks, try disabling rpc.lockd on all sides, as well as rpc.statd.  I
> don't think they're involved here, but it's worth disabling them to be
> sure.

There's no lockd running, only the statd on the server, so we already
can rule out the lockd.

BTW.: Robert, mwlucas CCed you in a mail regarding the use of the
FreeBSD Foundation address for the commercial icc license, can you
please confirm that you got the mail?

Bye,
Alexander.

-- 
                       There's no place like ~

http://www.Leidinger.net                       Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030828193342.3cb6a927.Alexander>