From owner-freebsd-current@FreeBSD.ORG Thu Aug 28 10:32:23 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 68EC516A4BF; Thu, 28 Aug 2003 10:32:23 -0700 (PDT) Received: from mailout07.sul.t-online.com (mailout07.sul.t-online.com [194.25.134.83]) by mx1.FreeBSD.org (Postfix) with ESMTP id 688D043FB1; Thu, 28 Aug 2003 10:32:21 -0700 (PDT) (envelope-from Alexander@Leidinger.net) Received: from fwd09.aul.t-online.de by mailout07.sul.t-online.com with smtp id 19sQdA-00013F-04; Thu, 28 Aug 2003 19:32:20 +0200 Received: from Andro-Beta.Leidinger.net (bHQSp6ZQre6m-51ujZZgkBz5t2NDyAPtcdUYb1KaV7DSkSAe7sCME2@[80.131.110.239]) by fmrl09.sul.t-online.com with esmtp id 19sQct-0oyvBo0; Thu, 28 Aug 2003 19:32:03 +0200 Received: from Magelan.Leidinger.net (Magelan [192.168.1.1]) h7SHX59O090883; Thu, 28 Aug 2003 19:33:05 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from Magelan.Leidinger.net (netchild@localhost [127.0.0.1]) by Magelan.Leidinger.net (8.12.9/8.12.9) with SMTP id h7SHXgiK027720; Thu, 28 Aug 2003 19:33:42 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Date: Thu, 28 Aug 2003 19:33:42 +0200 From: Alexander Leidinger To: freebsd-current@freebsd.org Message-Id: <20030828193342.3cb6a927.Alexander@Leidinger.net> In-Reply-To: References: <3F4CD409.5080703@telia.com> X-Mailer: Sylpheed version 0.9.3claws (GTK+ 1.2.10; i386-portbld-freebsd5.1) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Seen: false X-ID: bHQSp6ZQre6m-51ujZZgkBz5t2NDyAPtcdUYb1KaV7DSkSAe7sCME2@t-dialin.net cc: rwatson@freebsd.org Subject: Re: nfs tranfers hang in state getblck or nfsread X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Aug 2003 17:32:23 -0000 On Thu, 28 Aug 2003 08:54:07 -0400 (EDT) Robert Watson wrote: > Ok, so let me see if I have the sequence of events straight: > > (1) Boot a 4.8-RELEASE/STABLE NFS server > (2) Boot a 5.1-RELEASE/CURRENT NFS client > (3) Mount a file system using TCP NFSv3 > (4) Reboot the client system, reboot, and remount > (5) Thrash the file system a bit with large reads/writes, and it hangs > > Is this correct? I'd like to work out the minimum sequence of events > necessary to cause the problem. Is (4) necessary to reproduce the hang, > or can you cause it without (4) if you wait long enough? You mention a As my server "never" shuts down and the 5-current client is switched off in the night, I don't know about (4), but I don't think it's necessary (on a shutdown the filesystems get umounted and /var/db/mountdtab only show one mount for the client). > server reboot here, also, so I want to make sure I'm not confused about > the steps to hit the problem. In my case there's no server reboot. > Once the hang is occuring on the client, can you drop into DDB and do a > ps, and in particular, paste into an e-mail any lines about nfsiod > threads, and any threads that are blocked in nfs? Normally I don't notice that it is blocked, as you see in the following, it may also be the case, that the server is alive again in the same second: ---snip--- /var/log/messages.0.bz2:Aug 24 11:52:05 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding /var/log/messages.0.bz2:Aug 24 11:52:27 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again /var/log/messages.0.bz2:Aug 24 11:52:28 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding /var/log/messages.0.bz2:Aug 24 11:52:36 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again /var/log/messages.0.bz2:Aug 24 11:52:46 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding /var/log/messages.0.bz2:Aug 24 11:52:46 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding /var/log/messages.0.bz2:Aug 24 11:52:46 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again /var/log/messages.0.bz2:Aug 24 11:52:46 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again /var/log/messages.0.bz2:Aug 24 11:53:13 Magelan kernel: nfs server Andro-Beta:/big/Windows: not responding /var/log/messages.0.bz2:Aug 24 11:53:58 Magelan kernel: nfs server Andro-Beta:/big/Windows: is alive again ---snip--- > For kicks, try disabling rpc.lockd on all sides, as well as rpc.statd. I > don't think they're involved here, but it's worth disabling them to be > sure. There's no lockd running, only the statd on the server, so we already can rule out the lockd. BTW.: Robert, mwlucas CCed you in a mail regarding the use of the FreeBSD Foundation address for the commercial icc license, can you please confirm that you got the mail? Bye, Alexander. -- There's no place like ~ http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7