Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Aug 2019 20:29:45 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 240134] [ZFS] Kernel panic while importing zpool (blkptr at <addr> has invalid COMPRESS 127)
Message-ID:  <bug-240134-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D240134

            Bug ID: 240134
           Summary: [ZFS] Kernel panic while importing zpool (blkptr at
                    <addr> has invalid COMPRESS 127)
           Product: Base System
           Version: 12.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Keywords: panic
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: demik+freebsd@lostwave.net

Hello,

One of my systems is stuck in a reboot loop. Kernel Panic every time while
importing zpool (root-on-ZFS). Root pool is ZFS mirror (

This happened a few days (hours ?) after upgrading the root pool from FreeB=
SD
11 to 12. Not sure if its related or not.=20

The issue is reproducible on other systems (ZFS mirror). Tried a set of x86=
_64
and powerpc64 systems: same issue everywhere.

Here is the kernel panic:

ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Solaris: WARNING: blkptr at 0xfffffe001be3a800 has invalid COMPRESS 127
Solaris: WARNING: blkptr at 0xfffffe001be3a800 has invalid ETYPE 255


Fatal trap 12: page fault while in kernel mode
cpuid =3D 0; apic id =3D 00
fault virtual address   =3D 0x88
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff828f01b5
stack pointer           =3D 0x28:0xfffffe00005f5710
frame pointer           =3D 0x28:0xfffffe00005f5750
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 828 (zpool)
trap number             =3D 12
panic: page fault
cpuid =3D 0
time =3D 1566854520
KDB: stack backtrace:
#0 0xffffffff80be78d7 at kdb_backtrace+0x67
#1 0xffffffff80b9b4b3 at vpanic+0x1a3
#2 0xffffffff80b9b303 at panic+0x43
#3 0xffffffff81074bff at trap_fatal+0x35f
#4 0xffffffff81074c59 at trap_pfault+0x49
#5 0xffffffff8107427e at trap+0x29e
#6 0xffffffff8104f625 at calltrap+0x8
#7 0xffffffff8290267a at zio_checksum_verify+0x6a
#8 0xffffffff828fe2ec at zio_execute+0xbc
#9 0xffffffff82901d2c at zio_vdev_io_start+0x15c
#10 0xffffffff828fe2ec at zio_execute+0xbc
#11 0xffffffff828fdbfb at zio_nowait+0xcb
#12 0xffffffff82849c89 at arc_read+0x759
#13 0xffffffff8287353d at traverse_prefetch_metadata+0xbd
#14 0xffffffff828729ee at traverse_visitbp+0x3be
#15 0xffffffff82873623 at traverse_dnode+0xd3
#16 0xffffffff82872fa8 at traverse_visitbp+0x978
#17 0xffffffff82872a51 at traverse_visitbp+0x421
Uptime: 2m42s
(da1:umass-sim0:0:0:0): Synchronize cache failed
Dumping 161 out of 2009 MB:..10%..20%..30%..40%..50%..60%..70%..80%..90%..1=
00%
Dump complete

Server was stable before this, did check the following :
- none of the usual zpool rescue import options works (-F, -X, etc=E2=80=A6)
- mem testing: no errors
- checked both drives for bad sectors: nothing
- tried importing on ZoL v0.7.12 : PANIC(), the backtrace is somewhat diffe=
rent

After dd'ing a few TBs, The issue is reproduced easily inside a virtual
machine. Both drives seems to have the exact same corruption, so that's not=
 a
drive issue (different vendors, one entreprise drive)

Looks like we have two issues there:
- The first that caused the corruption. Trying to reproduce this (probably
non-ECC memory though)
- The second is KP() while importing the pool (this bug report)

Did more testing using zdb my limited knowledge. Issue is reproductible with
zdb:

zdb -AAA -e -ddd zroot/usr/local
Assertion failed: (!BP_IS_EMBEDDED(bp) || BPE_GET_ETYPE(bp) =3D=3D
BP_EMBEDDED_TYPE_DATA), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 5724.
Assertion failed: ((hdr)->b_lsize << 9) > 0 (0x0 > 0x0), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 3340.
Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447.
Assertion failed: (bytes > 0), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 5032.
Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447.
Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447.
WARNING: blkptr at 0x80b124840 has invalid COMPRESS 127
WARNING: blkptr at 0x80b124840 has invalid ETYPE 255
Assertion failed: (!BP_IS_EMBEDDED(bp)), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 1321.
Assertion failed: (zio->io_error !=3D 0), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c, line
660.
Assertion failed: (zio->io_vd !=3D NULL), file
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 3619.

Objects from 51200 to 51231 on dataset zroot/usr/local are crashing zdb.
Anything else is fine.

Bonus question: is there a way to nuke this dataset to recover recent files=
 ?

Core dumps available if needed. Willing to test a few patches since I've
reproduced this in a lab.

Thanks for your help.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-240134-227>