Date: Wed, 14 Nov 2012 07:27:30 +1100 From: Peter Jeremy <peter@rulingia.com> To: Andriy Gapon <avg@FreeBSD.org> Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs diff deadlock Message-ID: <20121113202730.GA42238@server.rulingia.com> In-Reply-To: <509F5E0A.1020501@FreeBSD.org> References: <20121110223249.GB506@server.rulingia.com> <20121111072739.GA4814@server.rulingia.com> <509F5E0A.1020501@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--y0ulUmNC+osPPQO6
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On 2012-Nov-11 10:12:58 +0200, Andriy Gapon <avg@FreeBSD.org> wrote:
>on 11/11/2012 09:27 Peter Jeremy said the following:
>> On 2012-Nov-11 09:32:49 +1100, Peter Jeremy <peter@server.rulingia.com>
>> wrote:
>>> I recently decided to do a "zfs diff" between two snapshots to try and
>>> identify why there was so much "USED" space in the snapshot. The diff r=
an
>>> for a while (though with very little IO) but has now wedged unkillably.
>>> There's nothing on the console or in any logs, the pool reports no
>>> problems and there are no other visible FS issues. Any ideas on tracki=
ng
>>> this down?
>> ...
>>> The systems is running a 4-month old 8-stable (r237444)
>>=20
>> I've tried a second system running the same world with the same result, =
so=20
>> this looks like a real bug in ZFS rather than a system glitch.
>>=20
>
>Are you able to catch the state of all threads in the system?
>E.g. via procstat -k -a.
>Or a crash dump.
Unfortunately, neither of those systems are really suitable for
debugging. I have setup a VBox and sent most of the offending FS to
it. That gives somewhat different results: On a recent 8-stable
(r242865M), I get a panic whilst on a recent head, I get a "Unable to
determine path or stats" error.
On 8-stable, I have a crashdump and the panic is:
suspending ithread with the following locks held:
shared spin mutex ({6") r =3D 0 (0xffffff005c395a80) locked @ /usr/src/sys=
/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:522
panic: witness_warn
cpuid =3D 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1ce
witness_warn() at witness_warn+0x2b2
ithread_loop() at ithread_loop+0x112
fork_exit() at fork_exit+0x11d
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip =3D 0, rsp =3D 0xffffff800008ccf0, rbp =3D 0 ---
Note that zap.c:522 is the rw_enter() in zap_get_leaf_byblk() - which
is the offending function in the backtrace on r237444.
On head, I get some normal differences terminated by:
Unable to determine path or stats for object 2128453 in tank/beckett/home@2=
0120518: Invalid argument
A scrub reports no issues but the problem remains:
root@FB10-64:~ # zpool status=20
pool: tank
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support=
feature
flags.
scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
ada2 ONLINE 0 0 0
errors: No known data errors
I've done some searching and found 2 hits on the message - one in an
OI IRC log and the other in a ZFS-on-Linux list. Neither offered any
insights.
I've tried ktracing the zfs diff and that ends:
1856 zfs CALL read(0x7,0x7fffffbfc160,0x18)
1856 zfs GIO fd 7 read 24 bytes
0x0000 0400 0000 0000 0000 e079 2000 0000 0000 397a 2000 0000 0000 =
=
|.........y .....9z .....|
1856 zfs RET read 24/0x18
1856 zfs CALL ioctl(0x3,0xd5985a36,0x7fffffbfc178)
1856 zfs RET ioctl 0
1856 zfs CALL read(0x7,0x7fffffbfc160,0x18)
1856 zfs GIO fd 7 read 24 bytes
0x0000 0200 0000 0000 0000 3a7a 2000 0000 0000 4d7a 2000 0000 0000 =
=
|........:z .....Mz .....|
1856 zfs RET read 24/0x18
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 2 No such file or directory
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl 0
1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18)
1856 zfs RET ioctl -1 errno 22 Invalid argument
1856 zfs CALL close(0x1)
1856 zfs RET close 0
1856 zfs CALL close(0x7)
1856 zfs RET ioctl -1 errno 32 Broken pipe
1856 zfs CALL close(0x8)
1856 zfs RET close 0
1856 zfs CALL thr_kill(0x18adf,SIG 32)
1856 zfs RET thr_kill 0
1856 zfs CALL _umtx_op(0x802c06c00,0x2,0x18adf,0,0)
1856 zfs RET close 0
1856 zfs PSIG SIG 32 caught handler=3D0x8020537f0 mask=3D0x0 code=
=3DSI_LWP
1856 zfs CALL sigreturn(0x7fffffbfbca0)
1856 zfs RET sigreturn JUSTRETURN
1856 zfs CALL thr_exit(0x802c06c00)
1856 zfs RET _umtx_op 0
1856 zfs CALL close(0x6)
1856 zfs RET close 0
1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888)
1856 zfs NAMI "/usr/share/nls/C/libc.cat"
1856 zfs RET stat -1 errno 2 No such file or directory
1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888)
1856 zfs NAMI "/usr/share/nls/libc/C"
1856 zfs RET stat -1 errno 2 No such file or directory
1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888)
1856 zfs NAMI "/usr/local/share/nls/C/libc.cat"
1856 zfs RET stat -1 errno 2 No such file or directory
1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888)
1856 zfs NAMI "/usr/local/share/nls/libc/C"
1856 zfs RET stat -1 errno 2 No such file or directory
1856 zfs CALL write(0x2,0x7fffffffa740,0x65)
1856 zfs GIO fd 2 wrote 101 bytes
"Unable to determine path or stats for object 2128453 in tank/becket=
t/home@20120518: Invalid argument
"
1856 zfs RET write 101/0x65
1856 zfs CALL close(0x5)
1856 zfs RET close 0
1856 zfs CALL close(0x3)
1856 zfs RET close 0
1856 zfs CALL close(0x4)
1856 zfs RET close 0
1856 zfs CALL exit(0x1)
--=20
Peter Jeremy
--y0ulUmNC+osPPQO6
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
iEYEARECAAYFAlCirTIACgkQ/opHv/APuIccAACdGfzyqFTjb5UUcu7pqRgz3DiH
pB4An0fjLFS7wQwDVAJCxEiALo4kcJZB
=E0LV
-----END PGP SIGNATURE-----
--y0ulUmNC+osPPQO6--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121113202730.GA42238>
