From owner-freebsd-fs@freebsd.org Wed Jul 8 09:35:12 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4D83995F1B for ; Wed, 8 Jul 2015 09:35:12 +0000 (UTC) (envelope-from gergely.czuczy@harmless.hu) Received: from marvin.harmless.hu (marvin.harmless.hu [195.56.55.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6917B1518 for ; Wed, 8 Jul 2015 09:35:11 +0000 (UTC) (envelope-from gergely.czuczy@harmless.hu) Received: from business-89-133-214-250.business.broadband.hu ([89.133.214.250] helo=[10.128.1.202]) by marvin.harmless.hu with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.84 (FreeBSD)) (envelope-from ) id 1ZClgf-000I6h-4g for freebsd-fs@freebsd.org; Wed, 08 Jul 2015 09:30:41 +0000 To: freebsd-fs@freebsd.org From: Gergely Czuczy Subject: Crashed ZFS pool Message-ID: <559CEDC3.2040107@harmless.hu> Date: Wed, 8 Jul 2015 11:30:43 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 09:35:12 -0000 Hello, We have a crashed ZFS pool. Initially the system was running 8, which we've upgraded to 9, then to 10-STABLE yesterday. Upon importing the pool the system crashes with a panic. The pool used to have a file-backed zil device under /usr/zfslog, however the file size was 0 when this happened, and it used to be bigger. We've set vfs.zfs.recover=1 in /boot/loader.conf, and trying to import it with: # zpool import -fm tank But it crashes the system We've tried removing /boot/zfs/zpool.cache as well (renamed it actually), but it resulted in the same panic. # uname -a FreeBSD $x 10.2-PRERELEASE FreeBSD 10.2-PRERELEASE #0: Tue Jul 7 20:30:27 CEST 2015 toor@$x:/usr/obj/usr/src/sys/REFLECTION amd64 When running zdb -AAAFXve tank it dumps some info, then gets stuck. zdb output can be found here: http://czg.harmless.hu/zfscrash/tank.zdb-AAAFXve.script The suspicious part is: Assertion failed: zap_lookup(ddt->ddt_os, ddt->ddt_spa->spa_ddt_stat_object, name, sizeof (uint64_t), sizeof (ddt_histogram_t) / sizeof (uint64_t), &ddt->ddt_histogram[type][class]) == 0 (0x6 == 0x0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, line 127. Assertion failed: (ddt_object_info(ddt, type, class, &doi) == 0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, line 132. zdb seems to be stuck in the following state: 21697 zdb RET read 8 21697 zdb CALL _umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL _umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL read(0x4,0x7fffd7fbdf50,0x8) 21697 zdb GIO fd 4 read 8 bytes 0x0000 02be fe70 08a4 2335 |...p..#5| 21697 zdb RET read 8 21697 zdb CALL _umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL _umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL read(0x4,0x7fffd7fbdf50,0x8) 21697 zdb GIO fd 4 read 8 bytes 0x0000 459a ca93 c54b 9922 |E....K."| 21697 zdb RET read 8 21697 zdb CALL _umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL _umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80) However I wasn't able to find what's FD 4. There were no disk read errors in dmesg/messages, so i'm not sure what would be timing out. And here's a screenshot of the crash: http://czg.harmless.hu/zfscrash/zfspanic.jpg So, anyone has any idea what to do with it? It would be nice to get it back to a functional state. Or at least to a state where the data can be accessed. Thanks in advance, -czg