FreeBSD Mail Archives

Date:      Thu, 29 Aug 2019 13:37:06 +0700
From:      Victor Sudakov <vas@mpeks.tomsk.su>
To:        freebsd-questions@freebsd.org
Subject:   Re: Kernel panic and ZFS corruption on 11.3-RELEASE
Message-ID:  <20190829063706.GB34810@admin.sibptus.ru>
In-Reply-To: <2964dd94-ad99-d0b8-c5d8-5d276cf02d06@gmail.com>
References:  <20190828025728.GA1441@admin.sibptus.ru> <2964dd94-ad99-d0b8-c5d8-5d276cf02d06@gmail.com>


--CdrF4e02JqNVZeln
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

MJ wrote:
>=20
>=20
> On 28/08/2019 12:57 pm, Victor Sudakov wrote:
> > Dear Colleagues,
> >=20
> > Shortly after upgrading to 11.3-RELEASE I had a kernel panic:
> >=20
> > Aug 28 00:01:40 vas kernel: panic: solaris assert: dmu_buf_hold_array(o=
s, object, offset, size, 0, ((char *)(uintptr_t)__func__), &numbufs, &dbp) =
=3D=3D 0 (0x5 =3D=3D 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/=
common/fs/zfs/dmu.c, line: 1022
> > Aug 28 00:01:40 vas kernel: cpuid =3D 0
> > Aug 28 00:01:40 vas kernel: KDB: stack backtrace:
> > Aug 28 00:01:40 vas kernel: #0 0xffffffff80b4c4d7 at kdb_backtrace+0x67
> > Aug 28 00:01:40 vas kernel: #1 0xffffffff80b054ee at vpanic+0x17e
> > Aug 28 00:01:40 vas kernel: #2 0xffffffff80b05363 at panic+0x43
> > Aug 28 00:01:40 vas kernel: #3 0xffffffff8260322c at assfail3+0x2c
> > Aug 28 00:01:40 vas kernel: #4 0xffffffff822a9585 at dmu_write+0xa5
> > Aug 28 00:01:40 vas kernel: #5 0xffffffff82302b38 at space_map_write+0x=
188
> > Aug 28 00:01:40 vas kernel: #6 0xffffffff822e31fd at metaslab_sync+0x41d
> > Aug 28 00:01:40 vas kernel: #7 0xffffffff8230b63b at vdev_sync+0xab
> > Aug 28 00:01:40 vas kernel: #8 0xffffffff822f776b at spa_sync+0xb5b
> > Aug 28 00:01:40 vas kernel: #9 0xffffffff82304420 at txg_sync_thread+0x=
280
> > Aug 28 00:01:40 vas kernel: #10 0xffffffff80ac8ac3 at fork_exit+0x83
> > Aug 28 00:01:40 vas kernel: #11 0xffffffff80f69d6e at fork_trampoline+0=
xe
> > Aug 28 00:01:40 vas kernel: Uptime: 14d3h42m57s
> >=20
> > after which the ZFS pool became corrupt:
> >=20
> >    pool: d02
> >   state: FAULTED
> > status: The pool metadata is corrupted and the pool cannot be opened.
> > action: Recovery is possible, but will result in some data loss.
> > 	Returning the pool to its state as of =D0=B2=D1=82=D0=BE=D1=80=D0=BD=
=D0=B8=D0=BA, 27 =D0=B0=D0=B2=D0=B3=D1=83=D1=81=D1=82=D0=B0 2019 =D0=B3. 23=
:51:20
> > 	should correct the problem.  Approximately 9 minutes of data
> > 	must be discarded, irreversibly.  Recovery can be attempted
> > 	by executing 'zpool clear -F d02'.  A scrub of the pool
> > 	is strongly recommended after recovery.
> >     see: http://illumos.org/msg/ZFS-8000-72
> >    scan: resilvered 423K in 0 days 00:00:05 with 0 errors on Sat Sep 30=
 04:12:20 2017
> > config:
> >=20
> > 	NAME	    STATE     READ WRITE CKSUM
> > 	d02	    FAULTED	 0     0     2
> > 	  ada2.eli  ONLINE	 0     0    12
> >=20
> > However, "zpool clear -F d02" results in error:
> > cannot clear errors for d02: I/O error
> >=20
> > Do you know if there is a way to recover the data, or should I say fare=
well to several hundred Gb of anime?
> >=20
> > PS I think I do have the vmcore file if someone is interested to debug =
the panic.
>=20
> Do you have a backup? Then restore it.

No, it's much more interesting to try and recover the pool.

>=20
> If you don't, have you tried
> zpool import -F d02

I've tried "zpool clear -F d02" with no success (see above).

Later I tried "zpool import -Ff d02", but on an 11.2 system, as David
Christensen advised, and this was a success.

> Some references you might like to read:
> https://docs.oracle.com/cd/E19253-01/819-5461/gbctt/index.html
> Take note of this section:
> "If the damaged pool is in the zpool.cache file, the problem is discovere=
d when the system is booted, and the damaged pool is reported in the zpool =
status command. If the pool isn't in the zpool.cache file, it won't success=
fully import or open and you'll see the damaged pool messages when you atte=
mpt to import the pool."
>=20
> I've not had your exact error, but in the case of disk corruption/failure=
, I've used import as the sledgehammer approach.

What do you think made all the difference: 11.2 vs 11.3, or "import -F" vs =
"clear -F"?

What is the difference between  "import -F" vs "clear -F" in the fixing of =
zpool errors?

--=20
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
2:5005/49@fidonet http://vas.tomsk.ru/

--CdrF4e02JqNVZeln
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEcBAEBAgAGBQJdZ3KSAAoJEA2k8lmbXsY0HjIH/RUXX1rM4RmMBDixRK+MZNNi
Vj75972HZjN/66F7SmoUq8Qal5IPDN29JrBx2Nte6O2g3EaVMOvGBF8IdcuU9gTS
jpH5p7tQGvzZ2ihePQhuyCsd+mlf+G+zo9iD84Um15ftwa8kn5PG7g/VcMgd+PXR
qq48p/mpb6vLe4YMs/Dz1uEt43D0gUGOd2trfBA8Ix2RsrFik6rP8ZQN4tlgYaTZ
EwJBopVoHQooMBsKicO0M0D5H9UZBFTolsMUhlFOlB35I8Kb2UDeBX0xNqIUl0lB
wxwdNKY1B1dCNqj828a8xDI8A/kkrs0dEFNuArMepN8vR+UusEDeqa+JuZKovEo=
=b5oQ
-----END PGP SIGNATURE-----

--CdrF4e02JqNVZeln--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190829063706.GB34810>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation