From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 20:27:47 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6D18D31E; Tue, 13 Nov 2012 20:27:47 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id E325D8FC14; Tue, 13 Nov 2012 20:27:46 +0000 (UTC) Received: from server.rulingia.com (c220-239-241-202.belrs5.nsw.optusnet.com.au [220.239.241.202]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id qADKRcqn072314 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 14 Nov 2012 07:27:39 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id qADKRUJb044107 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 14 Nov 2012 07:27:30 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id qADKRUwc044106; Wed, 14 Nov 2012 07:27:30 +1100 (EST) (envelope-from peter) Date: Wed, 14 Nov 2012 07:27:30 +1100 From: Peter Jeremy To: Andriy Gapon Subject: Re: zfs diff deadlock Message-ID: <20121113202730.GA42238@server.rulingia.com> References: <20121110223249.GB506@server.rulingia.com> <20121111072739.GA4814@server.rulingia.com> <509F5E0A.1020501@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="y0ulUmNC+osPPQO6" Content-Disposition: inline In-Reply-To: <509F5E0A.1020501@FreeBSD.org> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 20:27:47 -0000 --y0ulUmNC+osPPQO6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-Nov-11 10:12:58 +0200, Andriy Gapon wrote: >on 11/11/2012 09:27 Peter Jeremy said the following: >> On 2012-Nov-11 09:32:49 +1100, Peter Jeremy >> wrote: >>> I recently decided to do a "zfs diff" between two snapshots to try and >>> identify why there was so much "USED" space in the snapshot. The diff r= an >>> for a while (though with very little IO) but has now wedged unkillably. >>> There's nothing on the console or in any logs, the pool reports no >>> problems and there are no other visible FS issues. Any ideas on tracki= ng >>> this down? >> ... >>> The systems is running a 4-month old 8-stable (r237444) >>=20 >> I've tried a second system running the same world with the same result, = so=20 >> this looks like a real bug in ZFS rather than a system glitch. >>=20 > >Are you able to catch the state of all threads in the system? >E.g. via procstat -k -a. >Or a crash dump. Unfortunately, neither of those systems are really suitable for debugging. I have setup a VBox and sent most of the offending FS to it. That gives somewhat different results: On a recent 8-stable (r242865M), I get a panic whilst on a recent head, I get a "Unable to determine path or stats" error. On 8-stable, I have a crashdump and the panic is: suspending ithread with the following locks held: shared spin mutex ({6") r =3D 0 (0xffffff005c395a80) locked @ /usr/src/sys= /modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:522 panic: witness_warn cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1ce witness_warn() at witness_warn+0x2b2 ithread_loop() at ithread_loop+0x112 fork_exit() at fork_exit+0x11d fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffff800008ccf0, rbp =3D 0 --- Note that zap.c:522 is the rw_enter() in zap_get_leaf_byblk() - which is the offending function in the backtrace on r237444. On head, I get some normal differences terminated by: Unable to determine path or stats for object 2128453 in tank/beckett/home@2= 0120518: Invalid argument A scrub reports no issues but the problem remains: root@FB10-64:~ # zpool status=20 pool: tank state: ONLINE status: The pool is formatted using a legacy on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on software that does not support= feature flags. scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ada2 ONLINE 0 0 0 errors: No known data errors I've done some searching and found 2 hits on the message - one in an OI IRC log and the other in a ZFS-on-Linux list. Neither offered any insights. I've tried ktracing the zfs diff and that ends: 1856 zfs CALL read(0x7,0x7fffffbfc160,0x18) 1856 zfs GIO fd 7 read 24 bytes 0x0000 0400 0000 0000 0000 e079 2000 0000 0000 397a 2000 0000 0000 = = |.........y .....9z .....| 1856 zfs RET read 24/0x18 1856 zfs CALL ioctl(0x3,0xd5985a36,0x7fffffbfc178) 1856 zfs RET ioctl 0 1856 zfs CALL read(0x7,0x7fffffbfc160,0x18) 1856 zfs GIO fd 7 read 24 bytes 0x0000 0200 0000 0000 0000 3a7a 2000 0000 0000 4d7a 2000 0000 0000 = = |........:z .....Mz .....| 1856 zfs RET read 24/0x18 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 22 Invalid argument 1856 zfs CALL close(0x1) 1856 zfs RET close 0 1856 zfs CALL close(0x7) 1856 zfs RET ioctl -1 errno 32 Broken pipe 1856 zfs CALL close(0x8) 1856 zfs RET close 0 1856 zfs CALL thr_kill(0x18adf,SIG 32) 1856 zfs RET thr_kill 0 1856 zfs CALL _umtx_op(0x802c06c00,0x2,0x18adf,0,0) 1856 zfs RET close 0 1856 zfs PSIG SIG 32 caught handler=3D0x8020537f0 mask=3D0x0 code= =3DSI_LWP 1856 zfs CALL sigreturn(0x7fffffbfbca0) 1856 zfs RET sigreturn JUSTRETURN 1856 zfs CALL thr_exit(0x802c06c00) 1856 zfs RET _umtx_op 0 1856 zfs CALL close(0x6) 1856 zfs RET close 0 1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888) 1856 zfs NAMI "/usr/share/nls/C/libc.cat" 1856 zfs RET stat -1 errno 2 No such file or directory 1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888) 1856 zfs NAMI "/usr/share/nls/libc/C" 1856 zfs RET stat -1 errno 2 No such file or directory 1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888) 1856 zfs NAMI "/usr/local/share/nls/C/libc.cat" 1856 zfs RET stat -1 errno 2 No such file or directory 1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888) 1856 zfs NAMI "/usr/local/share/nls/libc/C" 1856 zfs RET stat -1 errno 2 No such file or directory 1856 zfs CALL write(0x2,0x7fffffffa740,0x65) 1856 zfs GIO fd 2 wrote 101 bytes "Unable to determine path or stats for object 2128453 in tank/becket= t/home@20120518: Invalid argument " 1856 zfs RET write 101/0x65 1856 zfs CALL close(0x5) 1856 zfs RET close 0 1856 zfs CALL close(0x3) 1856 zfs RET close 0 1856 zfs CALL close(0x4) 1856 zfs RET close 0 1856 zfs CALL exit(0x1) --=20 Peter Jeremy --y0ulUmNC+osPPQO6 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlCirTIACgkQ/opHv/APuIccAACdGfzyqFTjb5UUcu7pqRgz3DiH pB4An0fjLFS7wQwDVAJCxEiALo4kcJZB =E0LV -----END PGP SIGNATURE----- --y0ulUmNC+osPPQO6--