From owner-freebsd-current@FreeBSD.ORG Thu May 26 08:09:37 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6EB6316A429 for ; Thu, 26 May 2005 08:09:37 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 635E343D55 for ; Thu, 26 May 2005 08:09:34 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (c211-30-75-229.belrs2.nsw.optusnet.com.au [211.30.75.229]) by mail13.syd.optusnet.com.au (8.12.11/8.12.11) with ESMTP id j4Q89VRE026628 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 26 May 2005 18:09:32 +1000 Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1]) by cirb503493.alcatel.com.au (8.12.10/8.12.10) with ESMTP id j4Q89URx017254; Thu, 26 May 2005 18:09:30 +1000 (EST) (envelope-from pjeremy@cirb503493.alcatel.com.au) Received: (from pjeremy@localhost) by cirb503493.alcatel.com.au (8.12.10/8.12.9/Submit) id j4Q89ToI017253; Thu, 26 May 2005 18:09:29 +1000 (EST) (envelope-from pjeremy) Date: Thu, 26 May 2005 18:09:28 +1000 From: Peter Jeremy To: Ted Faber Message-ID: <20050526080928.GE12640@cirb503493.alcatel.com.au> References: <20050526001806.GA1008@pun.isi.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050526001806.GA1008@pun.isi.edu> User-Agent: Mutt/1.4.2i Cc: freebsd-current@freebsd.org Subject: Re: hard deadlock(?) on -current; some debugging info, need help X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 May 2005 08:09:37 -0000 On Wed, 2005-May-25 17:18:06 -0700, Ted Faber wrote: >The system slowly grinds to a halt, and the lockup seems to invlove the >disk system. Nothing is waiting on physical I/O, but there are lots of locked vnodes. I notice there's a sh(? - pid 10715) blocked on nfsreq. Can you reproduce the problem without the NFS mounted filesystems? > I have not found a sequence that triggers them (other than >trying to write mail to the list to report them), and I know how >difficult that makes things. It is common to have 2-5 a day. Even when >I can get to the debugger during a lockup, I cannot generate a crash >dump - the kernel reports starting the dump and moves no bytes. Not nice. That suggests something below the filesystem is sick because a filesystem deadlock won't affect the crashdump. >I've attached a dmesg from a -v boot and the kernel config (the dmesg is >not from the lockup run). Last friday when the system locked I had a >digital camera with me and took pictures of the ps output in the hopes >that someone could look at them. These images are at > >http://www.isi.edu/~faber/tmp/deadlock/DSCN04{75,76,77,78,79,80,81,82}.JPG The other information we need is "show lockedvnods". This will hopefully point to the process that started the problem. -- Peter Jeremy