Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Feb 2015 15:19:07 +0100
From:      Fabian Keil <freebsd-listen@fabiankeil.de>
To:        freebsd-fs@freebsd.org
Subject:   Re: panic: solaris assert: rt->rt_space == 0 (0xe000 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c, line: 153
Message-ID:  <580853d0.0ab6eb7d@fabiankeil.de>
In-Reply-To: <04f3092d.6fdfad8a@fabiankeil.de>
References:  <04f3092d.6fdfad8a@fabiankeil.de>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/VrTK.JS8LJI6ASPZZ6VTbyh
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Fabian Keil <freebsd-listen@fabiankeil.de> wrote:

> Using an 11.0-CURRENT based on r276255 I just got a panic
> after trying to export a certain ZFS pool:
[...]
> #10 0xffffffff81bdd22f in assfail3 (a=3D<value optimized out>, lv=3D<valu=
e optimized out>, op=3D<value optimized out>, rv=3D<value optimized out>, f=
=3D<value optimized out>, l=3D<value optimized out>)
>     at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
> #11 0xffffffff8194afc4 in range_tree_destroy (rt=3D0xfffff80011586000) at=
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c:153
> #12 0xffffffff819488bc in metaslab_fini (msp=3D0xfffff800611a9800) at /us=
r/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:1398
> #13 0xffffffff81965841 in vdev_free (vd=3D0xfffff8000696d800) at /usr/src=
/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:994
> #14 0xffffffff819657e1 in vdev_free (vd=3D0xfffff80040532000) at /usr/src=
/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:683
> #15 0xffffffff81953948 in spa_unload (spa=3D0xfffff800106af000) at /usr/s=
rc/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1314
> #16 0xffffffff81957a58 in spa_export_common (pool=3D<value optimized out>=
, new_state=3D1, oldconfig=3D0x0, force=3D<value optimized out>, hardforce=
=3D0)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:4540
> #17 0xffffffff81957b08 in spa_export (pool=3D0x0, oldconfig=3D0xfffffe009=
4a624f0, force=3D128, hardforce=3D50) at /usr/src/sys/cddl/contrib/opensola=
ris/uts/common/fs/zfs/spa.c:4574
> #18 0xffffffff8199ed50 in zfs_ioc_pool_export (zc=3D0xfffffe0006fbf000) a=
t /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:1618
[...]
> (kgdb) f 11
> #11 0xffffffff8194afc4 in range_tree_destroy (rt=3D0xfffff80011586000) at=
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c:153
> 153		VERIFY0(rt->rt_space);
[...]
>=20
> After rebooting and reimporting the pool it looked like this:
>=20
> fk@r500 ~ $sudo zpool status -v wde4
>   pool: wde4
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
> 	corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
> 	entire pool from backup.
>    see: http://illumos.org/msg/ZFS-8000-8A
>   scan: scrub canceled on Tue Jan 20 00:22:26 2015
> config:
>=20
> 	NAME              STATE     READ WRITE CKSUM
> 	wde4              ONLINE       0     0    19
> 	  label/wde4.eli  ONLINE       0     0    76
>=20
> errors: Permanent errors have been detected in the following files:
>=20
>         <0xaf11f>:<0x0>
>         wde4/backup/r500/tank/home/fk:<0x0>
>         <0xffffffffffffffff>:<0x0>
>=20
> The export triggered the same panic again, but with a different rt->rt_sp=
ace value:
>=20
> panic: solaris assert: rt->rt_space =3D=3D 0 (0x22800 =3D=3D 0x0), file: =
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c, line:=
 153
>=20
> I probably won't have time to scrub the pool and investigate this further
> until next week.

With this patch and vfs.zfs.recover=3D1 the pool can be exported without pa=
nic:
https://www.fabiankeil.de/sourcecode/electrobsd/range_tree_destroy-Optional=
ly-tolerate-non-zero-rt-r.diff

Warnings from three pool exports:

Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 12000
Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2e200
Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 12000
Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2e200
Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 12000
Feb 14 13:49:22 r500 kernel: [268] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2e200

Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 11200
Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2ea00
Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 11200
Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2ea00
Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 11200
Feb 14 13:50:25 r500 kernel: [331] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2ea00

Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 12600
Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2ea00
Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 12600
Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2ea00
Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 12600
Feb 14 13:52:27 r500 kernel: [453] Solaris: WARNING: zfs: range_tree_destro=
y(): rt->rt_space !=3D 0: 2ea00

My impression is that the messages are the result of metaslab_fini() trigge=
ring
the problem tree times per export for each tree in msp->ms_defertree.

If the pool is imported readonly, the problem isn't triggered.

Due to interruptions the scrubbing will probably take a couple of days.
ZFS continues to complain about checksum errors but apparently no
affected files have been found yet:

fk@r500 ~ $sudo zpool status -v wde4=20
  pool: wde4
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub in progress since Sat Feb 14 14:19:15 2015
        32.0G scanned out of 1.68T at 10.8M/s, 44h25m to go
        0 repaired, 1.86% done
config:

	NAME              STATE     READ WRITE CKSUM
	wde4              ONLINE       0     0   867
	  label/wde4.eli  ONLINE       0     0 3.39K

errors: Permanent errors have been detected in the following files:

        <0xaf11f>:<0x0>
        wde4/backup/r500/tank/home/fk:<0x0>
        <0xffffffffffffffff>:<0x0>

BTW, any opinions on allowing to change vfs.zfs.recover without reboot?
https://www.fabiankeil.de/sourcecode/electrobsd/Make-vfs.zfs.recover-writab=
le-after-boot.diff

Fabian

--Sig_/VrTK.JS8LJI6ASPZZ6VTbyh
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlTfWVsACgkQBYqIVf93VJ2a2QCeLmVs+d3KcJori1RVobhq6qq9
jBEAn2V2c2+tct25EnZvZCdx5e05/Bcj
=6xwY
-----END PGP SIGNATURE-----

--Sig_/VrTK.JS8LJI6ASPZZ6VTbyh--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?580853d0.0ab6eb7d>