From owner-freebsd-stable@FreeBSD.ORG Wed May 26 00:09:27 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C7E06106564A for ; Wed, 26 May 2010 00:09:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 79BDA8FC14 for ; Wed, 26 May 2010 00:09:27 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEALcD/EuDaFvK/2dsb2JhbACeGXHARoUTBA X-IronPort-AV: E=Sophos;i="4.53,300,1272859200"; d="scan'208";a="77658296" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 25 May 2010 20:09:24 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id B175C109C2D9; Tue, 25 May 2010 20:09:25 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CmW4+VYgHBK3; Tue, 25 May 2010 20:09:25 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 33261109C2C5; Tue, 25 May 2010 20:09:25 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o4Q0OwI19859; Tue, 25 May 2010 20:24:58 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Tue, 25 May 2010 20:24:58 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Mark Morley In-Reply-To: <20100525215230.CCC9C2101D5@amazon.cs.uoguelph.ca> Message-ID: References: <20100525215230.CCC9C2101D5@amazon.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org Subject: hung on ufs vnode lock, was Re: NFS trouble on 7.3-STABLE i386 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 00:09:28 -0000 On Tue, 25 May 2010, Mark Morley wrote: > On Fri, 21 May 2010 11:32:33 -0400 (EDT) Rick Macklem wrote: On Fri, 21 May 2010, Mark Morley wrote: > >> Having an issue with a file server here (7.3-STABLE i386) >> >> The nfsd processes are hanging. Client access to the nfs shares stops working and the nfsd processes on the server cannot be killed by any means. There are no errors showing up anywhere on the server. The network connection to the server seems fine (ie: anything other than nfs traffic seems ok). Rebooting the server fixes the problem for a while, but it doesn't reboot easily. It times out on terminating the nfsd processes. When it finally does reboot the file system isn't marked clean, resulting in a long wait for fsck (although it doesn't find any problems, it's a multi terrabyte share and it takes a while). >> >> This morning it did it again. This time I tried manually killing nfsd but nothing I did would make them die. No errors. >> > Next time it happens, do a "ps axlH" to see what the nfsd threads are > waiting for. It might give you a hint as to what is happening. > > Ok, it did it again. ps axlH shows all the nfsd processes stuck in the _ufs_ state. The server isn't doing anything else, no other processes seem to be monopolizing resources or disks in any way. If the nfsd threads are sleeping on WCHAN "ufs", I think that means that they are waiting for a ufs vnode lock. I don't know what has changed between FreeBSD7.1 and FreeBSD7.3 that might have caused this. I changed the Subject: line in the hopes that someone who might know the answer to this will take a look. > > rpcinfo doesn't show anything amiss as far as I can tell (ie: rpc is running) > > After a reboot, one of the 32 nfsd's almost immediately goes into the "ufs" state and never leaves it (and never racks up and CPU time either). The others are fine. Slowly over time more and more enter this state. When I rebooted it today, all but one were in that state. The clients were bogging down, presumably because the one and only functioning nfsd was overworked. > > One client is running 8.1-prerelease as a test, and that particular client only will start getting lots of timeouts accessing the nfs share (even with less load than the other clients). Just in case it's tickling something on the server I've shut it down this time and I'm leaving it off for the time being. > I don't think that the 8.1-prerelease client is an issue. It's just that the FreeBSD8 krpc likes to generate the "not responding" messages more agreesively. They are pretty well meaningless, imho. rick