Date: Thu, 18 Jun 2020 13:55:20 -0400 From: Andrew Chanler <achanler@gmail.com> To: freebsd-fs@freebsd.org Subject: Re: kernel panic loop after zpool remove Message-ID: <CAA%2BA%2Ba8X9Usb92x2N5DMi7FuksUWAi6VwDw1oTO1anzKuEm4qQ@mail.gmail.com> In-Reply-To: <CAA%2BA%2Ba-7ZKgW8XXqgA_KUUVWvCuN0EZHtu3zKwR0Z%2BNrddarHg@mail.gmail.com> References: <CAA%2BA%2Ba-7ZKgW8XXqgA_KUUVWvCuN0EZHtu3zKwR0Z%2BNrddarHg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I managed to recover the data from this zpool. You can read about it on the forum if you are curious: https://forums.FreeBSD.org/threads/kernel-panic-loop-after-zpool-remove.75632/post-466625 Andrew On Thu, Jun 4, 2020 at 10:10 AM Andrew Chanler <achanler@gmail.com> wrote: > Hi, > > Is there anyway to get zfs and the zpools into a recovery 'safe-mode' > where it stops trying to do any manipulations of the zpools? I'm stuck in > a kernel panic loop after 'zpool remove'. With the recent discussions > about openzfs I was also considering trying to switch to openzfs but don't > want to get myself into more trouble! I just need to stop the remove vdev > operation and this 'metaslab free' code flow from running on the new vdev > so I can keep the system up long enough to read the data out. > > Thank you, > Andrew > > > The system is running 'FreeBSD 12.1-RELEASE-p5 GENERIC amd64'. > ------- > panic: solaris assert: ((offset) & ((1ULL << vd->vdev_ashift) - 1)) == 0 > (0x400 == 0x0), file: > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c, line: > 3593 > cpuid = 1 > time = 1590769420 > KDB: stack backtrace: > #0 0xffffffff80c1d307 at kdb_backtrace+0x67 > #1 0xffffffff80bd063d at vpanic+0x19d > #2 0xffffffff80bd0493 at panic+0x43 > #3 0xffffffff82a6922c at assfail3+0x2c > #4 0xffffffff828a3b83 at metaslab_free_concrete+0x103 > #5 0xffffffff828a4dd8 at metaslab_free+0x128 > #6 0xffffffff8290217c at zio_dva_free+0x1c > #7 0xffffffff828feb7c at zio_execute+0xac > #8 0xffffffff80c2fae4 at taskqueue_run_locked+0x154 > #9 0xffffffff80c30e18 at taskqueue_thread_loop+0x98 > #10 0xffffffff80b90c53 at fork_exit+0x83 > #11 0xffffffff81082c2e at fork_trampoline+0xe > Uptime: 7s > ------- > > I have a core dump and can provide more details to help debug the issue > and open a bug tracker too: > ------- > (kgdb) bt > #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 > #1 doadump (textdump=<optimized out>) at > /usr/src/sys/kern/kern_shutdown.c:371 > #2 0xffffffff80bd0238 in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:451 > #3 0xffffffff80bd0699 in vpanic (fmt=<optimized out>, ap=<optimized out>) > at /usr/src/sys/kern/kern_shutdown.c:877 > #4 0xffffffff80bd0493 in panic (fmt=<unavailable>) at > /usr/src/sys/kern/kern_shutdown.c:804 > #5 0xffffffff82a6922c in assfail3 (a=<unavailable>, lv=<unavailable>, > op=<unavailable>, rv=<unavailable>, f=<unavailable>, l=<optimized out>) > at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 > #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000, > offset=137438954496, asize=<optimized out>, checkpoint=0) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593 > #7 0xffffffff828a4dd8 in metaslab_free_dva (spa=<optimized out>, > checkpoint=0, dva=<optimized out>) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3863 > #8 metaslab_free (spa=<optimized out>, bp=0xfffff800043788a0, > txg=41924766, now=<optimized out>) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145 > #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070 > #10 0xffffffff828feb7c in zio_execute (zio=0xfffff80004378830) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786 > #11 0xffffffff80c2fae4 in taskqueue_run_locked (queue=0xfffff80004222800) > at /usr/src/sys/kern/subr_taskqueue.c:467 > #12 0xffffffff80c30e18 in taskqueue_thread_loop (arg=<optimized out>) at > /usr/src/sys/kern/subr_taskqueue.c:773 > #13 0xffffffff80b90c53 in fork_exit (callout=0xffffffff80c30d80 > <taskqueue_thread_loop>, arg=0xfffff800041d90b0, frame=0xfffffe004dcf9bc0) > at /usr/src/sys/kern/kern_fork.c:1065 > #14 <signal handler called> > (kgdb) frame 6 > #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000, > offset=137438954496, asize=<optimized out>, checkpoint=0) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593 > 3593 VERIFY0(P2PHASE(offset, 1ULL << vd->vdev_ashift)); > (kgdb) p /x offset > $1 = 0x2000000400 > (kgdb) p /x vd->vdev_ashift > $2 = 0xc > (kgdb) frame 9 > #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070 > 3070 metaslab_free(zio->io_spa, zio->io_bp, zio->io_txg, > B_FALSE); > (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_removing > $3 = 0x1 > (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_ashift > $4 = 0x9 > (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[1]->vdev_ashift > $5 = 0xc > (kgdb) frame 8 > #8 metaslab_free (spa=<optimized out>, bp=0xfffff800043788a0, > txg=41924766, now=<optimized out>) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145 > 4145 metaslab_free_dva(spa, &dva[d], > checkpoint); > (kgdb) p /x *bp > $6 = {blk_dva = {{dva_word = {0x100000002, 0x10000002}}, {dva_word = {0x0, > 0x0}}, {dva_word = {0x0, 0x0}}}, blk_prop = 0x8000020200010001, blk_pad = > {0x0, > 0x0}, blk_phys_birth = 0x0, blk_birth = 0x4, blk_fill = 0x0, blk_cksum > = {zc_word = {0x0, 0x0, 0x0, 0x0}}} > ------- > > Some of the notes on how I got in this state from the forum post ( > https://forums.freebsd.org/threads/kernel-panic-loop-after-zpool-remove.75632/ > ): > > I had two 2TB drives in mirror configuration for the last 7 years > upgrading FreeBSD and zfs as time went by. I finally needed more storage > and tried to add two 4TB drives as a second vdev mirror: > 'zpool add storage mirror /dev/ada3 /dev/ada4' > > Next the 'zpool status' showed: > --- > pool: storage > state: ONLINE > status: One or more devices are configured to use a non-native block size. > Expect reduced performance. > action: Replace affected devices with devices that support the > configured block size, or migrate data to a properly configured > pool. > scan: scrub repaired 0 in 0 days 08:39:28 with 0 errors on Sat May 9 > 01:19:54 2020 > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 block > size: 512B configured, 4096B native > ada2 ONLINE 0 0 0 block > size: 512B configured, 4096B native > mirror-1 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > > errors: No known data errors > --- > > I should have stopped there, but saw the block size warning and thought I > would try to fix it. The zdb showed mirror-0 with ashift 9 (512 byte > alignment) and mirror-1 with ashift 12 (4096 byte alignment). I issued > 'zpool remove storage mirror-0' and quickly went into a panic reboot loop. > Rebooting into single user mode, first zfs or zpool command loads the > driver and it panics again. Powering off the new drives and rebooting it > does not panic, but it fails to because the zpool is missing a top-level > vdev (mirror-1). > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAA%2BA%2Ba8X9Usb92x2N5DMi7FuksUWAi6VwDw1oTO1anzKuEm4qQ>