From owner-freebsd-bugs@freebsd.org Mon Aug 26 20:29:46 2019 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1C8D4E4576 for ; Mon, 26 Aug 2019 20:29:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 46HNsP72CDz3Ny6 for ; Mon, 26 Aug 2019 20:29:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id EF5EAE4575; Mon, 26 Aug 2019 20:29:45 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id EF237E4574 for ; Mon, 26 Aug 2019 20:29:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46HNsP66JJz3Ny5 for ; Mon, 26 Aug 2019 20:29:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id B0666B970 for ; Mon, 26 Aug 2019 20:29:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x7QKTjNj088101 for ; Mon, 26 Aug 2019 20:29:45 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x7QKTj9t088098 for bugs@FreeBSD.org; Mon, 26 Aug 2019 20:29:45 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 240134] [ZFS] Kernel panic while importing zpool (blkptr at has invalid COMPRESS 127) Date: Mon, 26 Aug 2019 20:29:45 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.0-RELEASE X-Bugzilla-Keywords: panic X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: demik+freebsd@lostwave.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status keywords bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2019 20:29:46 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D240134 Bug ID: 240134 Summary: [ZFS] Kernel panic while importing zpool (blkptr at has invalid COMPRESS 127) Product: Base System Version: 12.0-RELEASE Hardware: Any OS: Any Status: New Keywords: panic Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: demik+freebsd@lostwave.net Hello, One of my systems is stuck in a reboot loop. Kernel Panic every time while importing zpool (root-on-ZFS). Root pool is ZFS mirror ( This happened a few days (hours ?) after upgrading the root pool from FreeB= SD 11 to 12. Not sure if its related or not.=20 The issue is reproducible on other systems (ZFS mirror). Tried a set of x86= _64 and powerpc64 systems: same issue everywhere. Here is the kernel panic: ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Solaris: WARNING: blkptr at 0xfffffe001be3a800 has invalid COMPRESS 127 Solaris: WARNING: blkptr at 0xfffffe001be3a800 has invalid ETYPE 255 Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x88 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff828f01b5 stack pointer =3D 0x28:0xfffffe00005f5710 frame pointer =3D 0x28:0xfffffe00005f5750 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 828 (zpool) trap number =3D 12 panic: page fault cpuid =3D 0 time =3D 1566854520 KDB: stack backtrace: #0 0xffffffff80be78d7 at kdb_backtrace+0x67 #1 0xffffffff80b9b4b3 at vpanic+0x1a3 #2 0xffffffff80b9b303 at panic+0x43 #3 0xffffffff81074bff at trap_fatal+0x35f #4 0xffffffff81074c59 at trap_pfault+0x49 #5 0xffffffff8107427e at trap+0x29e #6 0xffffffff8104f625 at calltrap+0x8 #7 0xffffffff8290267a at zio_checksum_verify+0x6a #8 0xffffffff828fe2ec at zio_execute+0xbc #9 0xffffffff82901d2c at zio_vdev_io_start+0x15c #10 0xffffffff828fe2ec at zio_execute+0xbc #11 0xffffffff828fdbfb at zio_nowait+0xcb #12 0xffffffff82849c89 at arc_read+0x759 #13 0xffffffff8287353d at traverse_prefetch_metadata+0xbd #14 0xffffffff828729ee at traverse_visitbp+0x3be #15 0xffffffff82873623 at traverse_dnode+0xd3 #16 0xffffffff82872fa8 at traverse_visitbp+0x978 #17 0xffffffff82872a51 at traverse_visitbp+0x421 Uptime: 2m42s (da1:umass-sim0:0:0:0): Synchronize cache failed Dumping 161 out of 2009 MB:..10%..20%..30%..40%..50%..60%..70%..80%..90%..1= 00% Dump complete Server was stable before this, did check the following : - none of the usual zpool rescue import options works (-F, -X, etc=E2=80=A6) - mem testing: no errors - checked both drives for bad sectors: nothing - tried importing on ZoL v0.7.12 : PANIC(), the backtrace is somewhat diffe= rent After dd'ing a few TBs, The issue is reproduced easily inside a virtual machine. Both drives seems to have the exact same corruption, so that's not= a drive issue (different vendors, one entreprise drive) Looks like we have two issues there: - The first that caused the corruption. Trying to reproduce this (probably non-ECC memory though) - The second is KP() while importing the pool (this bug report) Did more testing using zdb my limited knowledge. Issue is reproductible with zdb: zdb -AAA -e -ddd zroot/usr/local Assertion failed: (!BP_IS_EMBEDDED(bp) || BPE_GET_ETYPE(bp) =3D=3D BP_EMBEDDED_TYPE_DATA), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 5724. Assertion failed: ((hdr)->b_lsize << 9) > 0 (0x0 > 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 3340. Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447. Assertion failed: (bytes > 0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 5032. Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447. Assertion failed: ((hdr)->b_lsize << 9) !=3D 0 (0x0 !=3D 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line 2447. WARNING: blkptr at 0x80b124840 has invalid COMPRESS 127 WARNING: blkptr at 0x80b124840 has invalid ETYPE 255 Assertion failed: (!BP_IS_EMBEDDED(bp)), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 1321. Assertion failed: (zio->io_error !=3D 0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c, line 660. Assertion failed: (zio->io_vd !=3D NULL), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 3619. Objects from 51200 to 51231 on dataset zroot/usr/local are crashing zdb. Anything else is fine. Bonus question: is there a way to nuke this dataset to recover recent files= ? Core dumps available if needed. Willing to test a few patches since I've reproduced this in a lab. Thanks for your help. --=20 You are receiving this mail because: You are the assignee for the bug.=