From owner-freebsd-stable@FreeBSD.ORG Wed May 26 14:22:09 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E60B21065719 for ; Wed, 26 May 2010 14:22:09 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3A6698FC15 for ; Wed, 26 May 2010 14:22:05 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id DE3F446B8B; Wed, 26 May 2010 10:22:04 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 032C68A01F; Wed, 26 May 2010 10:22:04 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Wed, 26 May 2010 09:28:20 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <20100525215230.CCC9C2101D5@amazon.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201005260928.20397.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 26 May 2010 10:22:04 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.4 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Rick Macklem , Mark Morley Subject: Re: hung on ufs vnode lock, was Re: NFS trouble on 7.3-STABLE i386 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 14:22:10 -0000 On Tuesday 25 May 2010 8:24:58 pm Rick Macklem wrote: > > On Tue, 25 May 2010, Mark Morley wrote: > > > On Fri, 21 May 2010 11:32:33 -0400 (EDT) Rick Macklem wrote: On Fri, 21 May 2010, Mark Morley wrote: > > > >> Having an issue with a file server here (7.3-STABLE i386) > >> > >> The nfsd processes are hanging. Client access to the nfs shares stops working and the nfsd processes on the server cannot be killed by any means. There are no errors showing up anywhere on the server. The network connection to the server seems fine (ie: anything other than nfs traffic seems ok). Rebooting the server fixes the problem for a while, but it doesn't reboot easily. It times out on terminating the nfsd processes. When it finally does reboot the file system isn't marked clean, resulting in a long wait for fsck (although it doesn't find any problems, it's a multi terrabyte share and it takes a while). > >> > >> This morning it did it again. This time I tried manually killing nfsd but nothing I did would make them die. No errors. > >> > > Next time it happens, do a "ps axlH" to see what the nfsd threads are > > waiting for. It might give you a hint as to what is happening. > > > > Ok, it did it again. ps axlH shows all the nfsd processes stuck in the _ufs_ state. The server isn't doing anything else, no other processes seem to be monopolizing resources or disks in any way. > > If the nfsd threads are sleeping on WCHAN "ufs", I think that means that > they are waiting for a ufs vnode lock. I don't know what has changed > between FreeBSD7.1 and FreeBSD7.3 that might have caused this. I changed > the Subject: line in the hopes that someone who might know the answer to > this will take a look. If you can break into ddb, you can use 'show sleepchain' to investigate. If you built the kernel with debug symbols you can use kgdb to investigate as well. I have something similar to 'show sleepchain' in the gdb scripts at www.freebsd.org/~jhb/gdb/. You can source the gdb6 file and do 'sleepchain ' to see what locks a given thread is blocked on. -- John Baldwin