From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 16 16:17:32 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 3F31DD05
 for <freebsd-fs@freebsd.org>; Tue, 16 Oct 2012 16:17:32 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 127A38FC18
 for <freebsd-fs@freebsd.org>; Tue, 16 Oct 2012 16:17:32 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6A670B924;
 Tue, 16 Oct 2012 12:17:31 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: dg17@penx.com
Subject: Re: I have a DDB session open to a crashed ZFS server
Date: Tue, 16 Oct 2012 12:15:33 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; )
References: <1350317019.71982.50.camel@btw.pki2.com>
 <201210160844.41042.jhb@freebsd.org> <1350400597.72003.32.camel@btw.pki2.com>
In-Reply-To: <1350400597.72003.32.camel@btw.pki2.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201210161215.33369.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 16 Oct 2012 12:17:31 -0400 (EDT)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Oct 2012 16:17:32 -0000

On Tuesday, October 16, 2012 11:16:37 am Dennis Glatting wrote:
> On Tue, 2012-10-16 at 08:44 -0400, John Baldwin wrote:
> > On Monday, October 15, 2012 12:03:39 pm Dennis Glatting wrote:
> > > FreeBSD/amd64 (mc) (ttyu0)
> > > 
> > > login: NMI ... going to debugger
> > > [ thread pid 11 tid 100003 ]
> > 
> > You got an NMI, not a crash.  What happens if you just continue ('c' command) 
> > from DDB?
> > 
> 
> I hit the NMI button because of the "crash," which is a misword, to get
> into DDB. 

Ah, I would suggest "hung" or "deadlocked" next time.  It certainly seems like
a deadlock since all CPUs are idle.  Some helpful commands here might be
'show sleepchain' and 'show lockchain'.

Pick a "stuck" process (like find) and run:

'show sleepchain <pid>'

In your case though it seems both 'find' and the various 'pbzip2' threads
are stuck on a condition variable, so there isn't an easy way to identify
an "owner" that is supposed to awaken these threads.  It could be a case
of a missed wakeup perhaps, but you'll need to get someone more familiar
with ZFS to identify where these codes should be awakened normally.

-- 
John Baldwin