From owner-freebsd-fs@freebsd.org  Wed Jul  8 09:35:12 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4D83995F1B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed,  8 Jul 2015 09:35:12 +0000 (UTC)
 (envelope-from gergely.czuczy@harmless.hu)
Received: from marvin.harmless.hu (marvin.harmless.hu [195.56.55.204])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6917B1518
 for <freebsd-fs@freebsd.org>; Wed,  8 Jul 2015 09:35:11 +0000 (UTC)
 (envelope-from gergely.czuczy@harmless.hu)
Received: from business-89-133-214-250.business.broadband.hu ([89.133.214.250]
 helo=[10.128.1.202])
 by marvin.harmless.hu with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128)
 (Exim 4.84 (FreeBSD)) (envelope-from <gergely.czuczy@harmless.hu>)
 id 1ZClgf-000I6h-4g
 for freebsd-fs@freebsd.org; Wed, 08 Jul 2015 09:30:41 +0000
To: freebsd-fs@freebsd.org
From: Gergely Czuczy <gergely.czuczy@harmless.hu>
Subject: Crashed ZFS pool
Message-ID: <559CEDC3.2040107@harmless.hu>
Date: Wed, 8 Jul 2015 11:30:43 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.0.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jul 2015 09:35:12 -0000

Hello,

We have a crashed ZFS pool. Initially the system was running 8, which 
we've upgraded to 9, then to 10-STABLE yesterday. Upon importing the 
pool the system crashes with a panic.

The pool used to have a file-backed zil device under /usr/zfslog, 
however the file size was 0 when this happened, and it used to be 
bigger. We've set vfs.zfs.recover=1 in /boot/loader.conf, and trying to 
import it with:
#  zpool import -fm tank
But it crashes the system
We've tried removing /boot/zfs/zpool.cache as well (renamed it 
actually), but it resulted in the same panic.

# uname -a
FreeBSD $x 10.2-PRERELEASE FreeBSD 10.2-PRERELEASE #0: Tue Jul  7 
20:30:27 CEST 2015     toor@$x:/usr/obj/usr/src/sys/REFLECTION amd64

When running zdb -AAAFXve tank it dumps some info, then gets stuck. zdb 
output can be found here:
http://czg.harmless.hu/zfscrash/tank.zdb-AAAFXve.script

The suspicious part is:
Assertion failed: zap_lookup(ddt->ddt_os, 
ddt->ddt_spa->spa_ddt_stat_object, name, sizeof (uint64_t), sizeof 
(ddt_histogram_t) / sizeof (uint64_t), &ddt->ddt_histogram[type][class]) 
== 0 (0x6 == 0x0), file 
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, 
line 127.
Assertion failed: (ddt_object_info(ddt, type, class, &doi) == 0), file 
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, 
line 132.

zdb seems to be stuck in the following state:
  21697 zdb      RET   read 8
  21697 zdb      CALL 
_umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL 
_umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL  read(0x4,0x7fffd7fbdf50,0x8)
  21697 zdb      GIO   fd 4 read 8 bytes
        0x0000 02be fe70 08a4 2335 |...p..#5|

  21697 zdb      RET   read 8
  21697 zdb      CALL 
_umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL 
_umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL  read(0x4,0x7fffd7fbdf50,0x8)
  21697 zdb      GIO   fd 4 read 8 bytes
        0x0000 459a ca93 c54b 9922 |E....K."|

  21697 zdb      RET   read 8
  21697 zdb      CALL 
_umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL 
_umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80)


However I wasn't able to find what's FD 4.

There were no disk read errors in dmesg/messages, so i'm not sure what 
would be timing out.

And here's a screenshot of the crash:
http://czg.harmless.hu/zfscrash/zfspanic.jpg

So, anyone has any idea what to do with it? It would be nice to get it 
back to a functional state. Or at least to a state where the data can be 
accessed.

Thanks in advance,
-czg