From owner-freebsd-fs@FreeBSD.ORG Tue Oct 16 16:17:32 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3F31DD05 for ; Tue, 16 Oct 2012 16:17:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 127A38FC18 for ; Tue, 16 Oct 2012 16:17:32 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6A670B924; Tue, 16 Oct 2012 12:17:31 -0400 (EDT) From: John Baldwin To: dg17@penx.com Subject: Re: I have a DDB session open to a crashed ZFS server Date: Tue, 16 Oct 2012 12:15:33 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; ) References: <1350317019.71982.50.camel@btw.pki2.com> <201210160844.41042.jhb@freebsd.org> <1350400597.72003.32.camel@btw.pki2.com> In-Reply-To: <1350400597.72003.32.camel@btw.pki2.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201210161215.33369.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 16 Oct 2012 12:17:31 -0400 (EDT) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2012 16:17:32 -0000 On Tuesday, October 16, 2012 11:16:37 am Dennis Glatting wrote: > On Tue, 2012-10-16 at 08:44 -0400, John Baldwin wrote: > > On Monday, October 15, 2012 12:03:39 pm Dennis Glatting wrote: > > > FreeBSD/amd64 (mc) (ttyu0) > > > > > > login: NMI ... going to debugger > > > [ thread pid 11 tid 100003 ] > > > > You got an NMI, not a crash. What happens if you just continue ('c' command) > > from DDB? > > > > I hit the NMI button because of the "crash," which is a misword, to get > into DDB. Ah, I would suggest "hung" or "deadlocked" next time. It certainly seems like a deadlock since all CPUs are idle. Some helpful commands here might be 'show sleepchain' and 'show lockchain'. Pick a "stuck" process (like find) and run: 'show sleepchain ' In your case though it seems both 'find' and the various 'pbzip2' threads are stuck on a condition variable, so there isn't an easy way to identify an "owner" that is supposed to awaken these threads. It could be a case of a missed wakeup perhaps, but you'll need to get someone more familiar with ZFS to identify where these codes should be awakened normally. -- John Baldwin