Date: Thu, 4 Jun 2020 10:10:58 -0400 From: Andrew Chanler <achanler@gmail.com> To: freebsd-fs@freebsd.org Subject: kernel panic loop after zpool remove Message-ID: <CAA%2BA%2Ba-7ZKgW8XXqgA_KUUVWvCuN0EZHtu3zKwR0Z%2BNrddarHg@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, Is there anyway to get zfs and the zpools into a recovery 'safe-mode' where it stops trying to do any manipulations of the zpools? I'm stuck in a kernel panic loop after 'zpool remove'. With the recent discussions about openzfs I was also considering trying to switch to openzfs but don't want to get myself into more trouble! I just need to stop the remove vdev operation and this 'metaslab free' code flow from running on the new vdev so I can keep the system up long enough to read the data out. Thank you, Andrew The system is running 'FreeBSD 12.1-RELEASE-p5 GENERIC amd64'. ------- panic: solaris assert: ((offset) & ((1ULL << vd->vdev_ashift) - 1)) == 0 (0x400 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c, line: 3593 cpuid = 1 time = 1590769420 KDB: stack backtrace: #0 0xffffffff80c1d307 at kdb_backtrace+0x67 #1 0xffffffff80bd063d at vpanic+0x19d #2 0xffffffff80bd0493 at panic+0x43 #3 0xffffffff82a6922c at assfail3+0x2c #4 0xffffffff828a3b83 at metaslab_free_concrete+0x103 #5 0xffffffff828a4dd8 at metaslab_free+0x128 #6 0xffffffff8290217c at zio_dva_free+0x1c #7 0xffffffff828feb7c at zio_execute+0xac #8 0xffffffff80c2fae4 at taskqueue_run_locked+0x154 #9 0xffffffff80c30e18 at taskqueue_thread_loop+0x98 #10 0xffffffff80b90c53 at fork_exit+0x83 #11 0xffffffff81082c2e at fork_trampoline+0xe Uptime: 7s ------- I have a core dump and can provide more details to help debug the issue and open a bug tracker too: ------- (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff80bd0238 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bd0699 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:877 #4 0xffffffff80bd0493 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:804 #5 0xffffffff82a6922c in assfail3 (a=<unavailable>, lv=<unavailable>, op=<unavailable>, rv=<unavailable>, f=<unavailable>, l=<optimized out>) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000, offset=137438954496, asize=<optimized out>, checkpoint=0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593 #7 0xffffffff828a4dd8 in metaslab_free_dva (spa=<optimized out>, checkpoint=0, dva=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3863 #8 metaslab_free (spa=<optimized out>, bp=0xfffff800043788a0, txg=41924766, now=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145 #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070 #10 0xffffffff828feb7c in zio_execute (zio=0xfffff80004378830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786 #11 0xffffffff80c2fae4 in taskqueue_run_locked (queue=0xfffff80004222800) at /usr/src/sys/kern/subr_taskqueue.c:467 #12 0xffffffff80c30e18 in taskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:773 #13 0xffffffff80b90c53 in fork_exit (callout=0xffffffff80c30d80 <taskqueue_thread_loop>, arg=0xfffff800041d90b0, frame=0xfffffe004dcf9bc0) at /usr/src/sys/kern/kern_fork.c:1065 #14 <signal handler called> (kgdb) frame 6 #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000, offset=137438954496, asize=<optimized out>, checkpoint=0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593 3593 VERIFY0(P2PHASE(offset, 1ULL << vd->vdev_ashift)); (kgdb) p /x offset $1 = 0x2000000400 (kgdb) p /x vd->vdev_ashift $2 = 0xc (kgdb) frame 9 #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070 3070 metaslab_free(zio->io_spa, zio->io_bp, zio->io_txg, B_FALSE); (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_removing $3 = 0x1 (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_ashift $4 = 0x9 (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[1]->vdev_ashift $5 = 0xc (kgdb) frame 8 #8 metaslab_free (spa=<optimized out>, bp=0xfffff800043788a0, txg=41924766, now=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145 4145 metaslab_free_dva(spa, &dva[d], checkpoint); (kgdb) p /x *bp $6 = {blk_dva = {{dva_word = {0x100000002, 0x10000002}}, {dva_word = {0x0, 0x0}}, {dva_word = {0x0, 0x0}}}, blk_prop = 0x8000020200010001, blk_pad = {0x0, 0x0}, blk_phys_birth = 0x0, blk_birth = 0x4, blk_fill = 0x0, blk_cksum = {zc_word = {0x0, 0x0, 0x0, 0x0}}} ------- Some of the notes on how I got in this state from the forum post ( https://forums.freebsd.org/threads/kernel-panic-loop-after-zpool-remove.75632/ ): I had two 2TB drives in mirror configuration for the last 7 years upgrading FreeBSD and zfs as time went by. I finally needed more storage and tried to add two 4TB drives as a second vdev mirror: 'zpool add storage mirror /dev/ada3 /dev/ada4' Next the 'zpool status' showed: --- pool: storage state: ONLINE status: One or more devices are configured to use a non-native block size. Expect reduced performance. action: Replace affected devices with devices that support the configured block size, or migrate data to a properly configured pool. scan: scrub repaired 0 in 0 days 08:39:28 with 0 errors on Sat May 9 01:19:54 2020 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 block size: 512B configured, 4096B native ada2 ONLINE 0 0 0 block size: 512B configured, 4096B native mirror-1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 errors: No known data errors --- I should have stopped there, but saw the block size warning and thought I would try to fix it. The zdb showed mirror-0 with ashift 9 (512 byte alignment) and mirror-1 with ashift 12 (4096 byte alignment). I issued 'zpool remove storage mirror-0' and quickly went into a panic reboot loop. Rebooting into single user mode, first zfs or zpool command loads the driver and it panics again. Powering off the new drives and rebooting it does not panic, but it fails to because the zpool is missing a top-level vdev (mirror-1).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAA%2BA%2Ba-7ZKgW8XXqgA_KUUVWvCuN0EZHtu3zKwR0Z%2BNrddarHg>