Date: Mon, 26 Aug 2019 20:29:45 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 240134] [ZFS] Kernel panic while importing zpool (blkptr at <addr> has invalid COMPRESS 127) Message-ID: <bug-240134-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D240134 Bug ID: 240134 Summary: [ZFS] Kernel panic while importing zpool (blkptr at <addr> has invalid COMPRESS 127) Product: Base System Version: 12.0-RELEASE Hardware: Any OS: Any Status: New Keywords: panic Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: demik+freebsd@lostwave.net Hello, One of my systems is stuck in a reboot loop. Kernel Panic every time while importing zpool (root-on-ZFS). Root pool is ZFS mirror ( This happened a few days (hours ?) after upgrading the root pool from FreeB= SD 11 to 12. Not sure if its related or not.=20 The issue is reproducible on other systems (ZFS mirror). Tried a set of x86= _64 and powerpc64 systems: same issue everywhere. Here is the kernel panic: ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Solaris: WARNING: blkptr at 0xfffffe001be3a800 has invalid COMPRESS 127 Solaris: WARNING: blkptr at 0xfffffe001be3a800 has invalid ETYPE 255 Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x88 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff828f01b5 stack pointer =3D 0x28:0xfffffe00005f5710 frame pointer =3D 0x28:0xfffffe00005f5750 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 828 (zpool) trap number =3D 12 panic: page fault cpuid =3D 0 time =3D 1566854520 KDB: stack backtrace: #0 0xffffffff80be78d7 at kdb_backtrace+0x67 #1 0xffffffff80b9b4b3 at vpanic+0x1a3 #2 0xffffffff80b9b303 at panic+0x43 #3 0xffffffff81074bff at trap_fatal+0x35f #4 0xffffffff81074c59 at trap_pfault+0x49 #5 0xffffffff8107427e at trap+0x29e #6 0xffffffff8104f625 at calltrap+0x8 #7 0xffffffff8290267a at zio_checksum_verify+0x6a #8 0xffffffff828fe2ec at zio_execute+0xbc #9 0xffffffff82901d2c at zio_vdev_io_start+0x15c #10 0xffffffff828fe2ec at zio_execute+0xbc #11 0xffffffff828fdbfb at zio_nowait+0xcb #12 0xffffffff82849c89 at arc_read+0x759 #13 0xffffffff8287353d at traverse_prefetch_metadata+0xbd #14 0xffffffff828729ee at traverse_visitbp+0x3be #15 0xffffffff82873623 at traverse_dnode+0xd3 #16 0xffffffff82872fa8 at traverse_visitbp+0x978 #17 0xffffffff82872a51 at traverse_visitbp+0x421 Uptime: 2m42s (da1:umass-sim0:0:0:0): Synchronize cache failed Dumping 161 out of 2009 MB:..10%..20%..30%..40%..50%..60%..70%..80%..90%..1= 00% Dump complete Server was stable before this, did check the following : - none of the usual zpool rescue import options works (-F, -X, etc=E2=80=A6) - mem testing: no errors - checked both drives for bad sectors: nothing - tried importing on ZoL v0.7.12 : PANIC(), the backtrace is somewhat diffe= rent After dd'ing a few TBs, The issue is reproduced easily inside a virtual machine. Both drives seems to have the exact same corruption, so that's not= a drive issue (different vendors, one entreprise drive) Looks like we have two issues there: - The first that caused the corruption. Trying to reproduce this (probably non-ECC memory though) - The second is KP() while importing the pool (this bug report) Did more testing using zdb my limited knowledge. Issue is reproductible with zdb: zdb -AAA -e -ddd zroot/usr/local Assertion failed: (!BP_IS_EMBEDDED(bp) || BPE_GET_ETYPE(bp) =3D=3D BP_EMBEDDED_TYPE_DATA), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 5724. Assertion failed: ((hdr)->b_lsize << 9) > 0 (0x0 > 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 3340. Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447. Assertion failed: (bytes > 0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 5032. Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447. Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447. WARNING: blkptr at 0x80b124840 has invalid COMPRESS 127 WARNING: blkptr at 0x80b124840 has invalid ETYPE 255 Assertion failed: (!BP_IS_EMBEDDED(bp)), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 1321. Assertion failed: (zio->io_error !=3D 0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c, line 660. Assertion failed: (zio->io_vd !=3D NULL), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 3619. Objects from 51200 to 51231 on dataset zroot/usr/local are crashing zdb. Anything else is fine. Bonus question: is there a way to nuke this dataset to recover recent files= ? Core dumps available if needed. Willing to test a few patches since I've reproduced this in a lab. Thanks for your help. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-240134-227>