Date: Thu, 29 Aug 2019 13:37:06 +0700 From: Victor Sudakov <vas@mpeks.tomsk.su> To: freebsd-questions@freebsd.org Subject: Re: Kernel panic and ZFS corruption on 11.3-RELEASE Message-ID: <20190829063706.GB34810@admin.sibptus.ru> In-Reply-To: <2964dd94-ad99-d0b8-c5d8-5d276cf02d06@gmail.com> References: <20190828025728.GA1441@admin.sibptus.ru> <2964dd94-ad99-d0b8-c5d8-5d276cf02d06@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--CdrF4e02JqNVZeln Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MJ wrote: >=20 >=20 > On 28/08/2019 12:57 pm, Victor Sudakov wrote: > > Dear Colleagues, > >=20 > > Shortly after upgrading to 11.3-RELEASE I had a kernel panic: > >=20 > > Aug 28 00:01:40 vas kernel: panic: solaris assert: dmu_buf_hold_array(o= s, object, offset, size, 0, ((char *)(uintptr_t)__func__), &numbufs, &dbp) = =3D=3D 0 (0x5 =3D=3D 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/= common/fs/zfs/dmu.c, line: 1022 > > Aug 28 00:01:40 vas kernel: cpuid =3D 0 > > Aug 28 00:01:40 vas kernel: KDB: stack backtrace: > > Aug 28 00:01:40 vas kernel: #0 0xffffffff80b4c4d7 at kdb_backtrace+0x67 > > Aug 28 00:01:40 vas kernel: #1 0xffffffff80b054ee at vpanic+0x17e > > Aug 28 00:01:40 vas kernel: #2 0xffffffff80b05363 at panic+0x43 > > Aug 28 00:01:40 vas kernel: #3 0xffffffff8260322c at assfail3+0x2c > > Aug 28 00:01:40 vas kernel: #4 0xffffffff822a9585 at dmu_write+0xa5 > > Aug 28 00:01:40 vas kernel: #5 0xffffffff82302b38 at space_map_write+0x= 188 > > Aug 28 00:01:40 vas kernel: #6 0xffffffff822e31fd at metaslab_sync+0x41d > > Aug 28 00:01:40 vas kernel: #7 0xffffffff8230b63b at vdev_sync+0xab > > Aug 28 00:01:40 vas kernel: #8 0xffffffff822f776b at spa_sync+0xb5b > > Aug 28 00:01:40 vas kernel: #9 0xffffffff82304420 at txg_sync_thread+0x= 280 > > Aug 28 00:01:40 vas kernel: #10 0xffffffff80ac8ac3 at fork_exit+0x83 > > Aug 28 00:01:40 vas kernel: #11 0xffffffff80f69d6e at fork_trampoline+0= xe > > Aug 28 00:01:40 vas kernel: Uptime: 14d3h42m57s > >=20 > > after which the ZFS pool became corrupt: > >=20 > > pool: d02 > > state: FAULTED > > status: The pool metadata is corrupted and the pool cannot be opened. > > action: Recovery is possible, but will result in some data loss. > > Returning the pool to its state as of =D0=B2=D1=82=D0=BE=D1=80=D0=BD= =D0=B8=D0=BA, 27 =D0=B0=D0=B2=D0=B3=D1=83=D1=81=D1=82=D0=B0 2019 =D0=B3. 23= :51:20 > > should correct the problem. Approximately 9 minutes of data > > must be discarded, irreversibly. Recovery can be attempted > > by executing 'zpool clear -F d02'. A scrub of the pool > > is strongly recommended after recovery. > > see: http://illumos.org/msg/ZFS-8000-72 > > scan: resilvered 423K in 0 days 00:00:05 with 0 errors on Sat Sep 30= 04:12:20 2017 > > config: > >=20 > > NAME STATE READ WRITE CKSUM > > d02 FAULTED 0 0 2 > > ada2.eli ONLINE 0 0 12 > >=20 > > However, "zpool clear -F d02" results in error: > > cannot clear errors for d02: I/O error > >=20 > > Do you know if there is a way to recover the data, or should I say fare= well to several hundred Gb of anime? > >=20 > > PS I think I do have the vmcore file if someone is interested to debug = the panic. >=20 > Do you have a backup? Then restore it. No, it's much more interesting to try and recover the pool. >=20 > If you don't, have you tried > zpool import -F d02 I've tried "zpool clear -F d02" with no success (see above). Later I tried "zpool import -Ff d02", but on an 11.2 system, as David Christensen advised, and this was a success. > Some references you might like to read: > https://docs.oracle.com/cd/E19253-01/819-5461/gbctt/index.html > Take note of this section: > "If the damaged pool is in the zpool.cache file, the problem is discovere= d when the system is booted, and the damaged pool is reported in the zpool = status command. If the pool isn't in the zpool.cache file, it won't success= fully import or open and you'll see the damaged pool messages when you atte= mpt to import the pool." >=20 > I've not had your exact error, but in the case of disk corruption/failure= , I've used import as the sledgehammer approach. What do you think made all the difference: 11.2 vs 11.3, or "import -F" vs = "clear -F"? What is the difference between "import -F" vs "clear -F" in the fixing of = zpool errors? --=20 Victor Sudakov, VAS4-RIPE, VAS47-RIPN 2:5005/49@fidonet http://vas.tomsk.ru/ --CdrF4e02JqNVZeln Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEcBAEBAgAGBQJdZ3KSAAoJEA2k8lmbXsY0HjIH/RUXX1rM4RmMBDixRK+MZNNi Vj75972HZjN/66F7SmoUq8Qal5IPDN29JrBx2Nte6O2g3EaVMOvGBF8IdcuU9gTS jpH5p7tQGvzZ2ihePQhuyCsd+mlf+G+zo9iD84Um15ftwa8kn5PG7g/VcMgd+PXR qq48p/mpb6vLe4YMs/Dz1uEt43D0gUGOd2trfBA8Ix2RsrFik6rP8ZQN4tlgYaTZ EwJBopVoHQooMBsKicO0M0D5H9UZBFTolsMUhlFOlB35I8Kb2UDeBX0xNqIUl0lB wxwdNKY1B1dCNqj828a8xDI8A/kkrs0dEFNuArMepN8vR+UusEDeqa+JuZKovEo= =b5oQ -----END PGP SIGNATURE----- --CdrF4e02JqNVZeln--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190829063706.GB34810>