Date: Sat, 14 Feb 2015 15:19:07 +0100 From: Fabian Keil <freebsd-listen@fabiankeil.de> To: freebsd-fs@freebsd.org Subject: Re: panic: solaris assert: rt->rt_space == 0 (0xe000 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c, line: 153 Message-ID: <580853d0.0ab6eb7d@fabiankeil.de> In-Reply-To: <04f3092d.6fdfad8a@fabiankeil.de> References: <04f3092d.6fdfad8a@fabiankeil.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/VrTK.JS8LJI6ASPZZ6VTbyh Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Fabian Keil <freebsd-listen@fabiankeil.de> wrote: > Using an 11.0-CURRENT based on r276255 I just got a panic > after trying to export a certain ZFS pool: [...] > #10 0xffffffff81bdd22f in assfail3 (a=3D<value optimized out>, lv=3D<valu= e optimized out>, op=3D<value optimized out>, rv=3D<value optimized out>, f= =3D<value optimized out>, l=3D<value optimized out>) > at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 > #11 0xffffffff8194afc4 in range_tree_destroy (rt=3D0xfffff80011586000) at= /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c:153 > #12 0xffffffff819488bc in metaslab_fini (msp=3D0xfffff800611a9800) at /us= r/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:1398 > #13 0xffffffff81965841 in vdev_free (vd=3D0xfffff8000696d800) at /usr/src= /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:994 > #14 0xffffffff819657e1 in vdev_free (vd=3D0xfffff80040532000) at /usr/src= /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:683 > #15 0xffffffff81953948 in spa_unload (spa=3D0xfffff800106af000) at /usr/s= rc/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1314 > #16 0xffffffff81957a58 in spa_export_common (pool=3D<value optimized out>= , new_state=3D1, oldconfig=3D0x0, force=3D<value optimized out>, hardforce= =3D0) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:4540 > #17 0xffffffff81957b08 in spa_export (pool=3D0x0, oldconfig=3D0xfffffe009= 4a624f0, force=3D128, hardforce=3D50) at /usr/src/sys/cddl/contrib/opensola= ris/uts/common/fs/zfs/spa.c:4574 > #18 0xffffffff8199ed50 in zfs_ioc_pool_export (zc=3D0xfffffe0006fbf000) a= t /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:1618 [...] > (kgdb) f 11 > #11 0xffffffff8194afc4 in range_tree_destroy (rt=3D0xfffff80011586000) at= /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c:153 > 153 VERIFY0(rt->rt_space); [...] >=20 > After rebooting and reimporting the pool it looked like this: >=20 > fk@r500 ~ $sudo zpool status -v wde4 > pool: wde4 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: scrub canceled on Tue Jan 20 00:22:26 2015 > config: >=20 > NAME STATE READ WRITE CKSUM > wde4 ONLINE 0 0 19 > label/wde4.eli ONLINE 0 0 76 >=20 > errors: Permanent errors have been detected in the following files: >=20 > <0xaf11f>:<0x0> > wde4/backup/r500/tank/home/fk:<0x0> > <0xffffffffffffffff>:<0x0> >=20 > The export triggered the same panic again, but with a different rt->rt_sp= ace value: >=20 > panic: solaris assert: rt->rt_space =3D=3D 0 (0x22800 =3D=3D 0x0), file: = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c, line:= 153 >=20 > I probably won't have time to scrub the pool and investigate this further > until next week. With this patch and vfs.zfs.recover=3D1 the pool can be exported without pa= nic: https://www.fabiankeil.de/sourcecode/electrobsd/range_tree_destroy-Optional= ly-tolerate-non-zero-rt-r.diff Warnings from three pool exports: Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 12000 Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2e200 Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 12000 Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2e200 Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 12000 Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2e200 Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 11200 Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2ea00 Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 11200 Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2ea00 Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 11200 Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2ea00 Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 12600 Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2ea00 Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 12600 Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2ea00 Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 12600 Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro= y(): rt->rt_space !=3D 0: 2ea00 My impression is that the messages are the result of metaslab_fini() trigge= ring the problem tree times per export for each tree in msp->ms_defertree. If the pool is imported readonly, the problem isn't triggered. Due to interruptions the scrubbing will probably take a couple of days. ZFS continues to complain about checksum errors but apparently no affected files have been found yet: fk@r500 ~ $sudo zpool status -v wde4=20 pool: wde4 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub in progress since Sat Feb 14 14:19:15 2015 32.0G scanned out of 1.68T at 10.8M/s, 44h25m to go 0 repaired, 1.86% done config: NAME STATE READ WRITE CKSUM wde4 ONLINE 0 0 867 label/wde4.eli ONLINE 0 0 3.39K errors: Permanent errors have been detected in the following files: <0xaf11f>:<0x0> wde4/backup/r500/tank/home/fk:<0x0> <0xffffffffffffffff>:<0x0> BTW, any opinions on allowing to change vfs.zfs.recover without reboot? https://www.fabiankeil.de/sourcecode/electrobsd/Make-vfs.zfs.recover-writab= le-after-boot.diff Fabian --Sig_/VrTK.JS8LJI6ASPZZ6VTbyh Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlTfWVsACgkQBYqIVf93VJ2a2QCeLmVs+d3KcJori1RVobhq6qq9 jBEAn2V2c2+tct25EnZvZCdx5e05/Bcj =6xwY -----END PGP SIGNATURE----- --Sig_/VrTK.JS8LJI6ASPZZ6VTbyh--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?580853d0.0ab6eb7d>