From owner-freebsd-fs@freebsd.org Thu Jun 4 14:11:11 2020 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id ABCD62F4E04 for ; Thu, 4 Jun 2020 14:11:11 +0000 (UTC) (envelope-from achanler@gmail.com) Received: from mail-ua1-x92c.google.com (mail-ua1-x92c.google.com [IPv6:2607:f8b0:4864:20::92c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49d73y6Fmwz3f49 for ; Thu, 4 Jun 2020 14:11:10 +0000 (UTC) (envelope-from achanler@gmail.com) Received: by mail-ua1-x92c.google.com with SMTP id b13so2113210uav.3 for ; Thu, 04 Jun 2020 07:11:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=kK5b/ioFOkWApzSMk1fYNTsqcp+WYIfpARwfm9PgUZk=; b=fevRjqegjAnY9WU4ytZONAOEMlNp8onyU7A7KxpSKCZshMAHcZnHFvbnJD8x77UtxY GqVKZawugSGnMo56lpZ04Hr1utrx6GUkVwsru8zeuITvIqe0dmFimJSU5Y5lq65je2wY bFJxZ3lvctcaTz7xQVHepGhivfseP8GgzeGqHXZd+28M5rZl54diW4bJ0+Dzg1tnu2oJ RHVROa2V1qXPA0OIyR0J3A2p/2ODpqD1DJx72Yd2g1okV91fAan8F00kA2NRqfcdLdZu nx0VytO3mMkhvKqMPRsohM6EX51tYU9r9HGBjK1lhrdU3nSJRWWse2Uw/4qcv2hUpuQd Hhig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=kK5b/ioFOkWApzSMk1fYNTsqcp+WYIfpARwfm9PgUZk=; b=gJQ6Df03TwxCZ8BqsIz/nGEpRXPcgAdZn6bHupWqofLGzsK8LlWfid8ragH+dqKrH5 xl5lLtsRYafs1Pz1t9FJeJTx0iEtA5CZc6L54mi8ZyTuDsZhti4414URipBZYv91MzPr BoDIsty3zx3JE07ZCY9NukvNvZ/FyJY2Qv/PuevonD501p04i8rND1YpNpThL2H71rzq EmorftNRlZEgDN0af3Lra2PCpKaFSgmpB3nibLZguGhDhRWL3yJdiuOM5/q/Ug2i6psl vr49gJsnn4G+m6jU6UcR4hx2tc0+wIc2QNd2bZ60qusGyoqLUdEEAYFCwrKLivVtqloh 2a6g== X-Gm-Message-State: AOAM531TSAzd+KHf9UiunTVePt4J9ZG9M+qAu0gGjyk2wGVIwyd0tDq9 6Bjp55C5Dbd5V3GGBjasxG1JPEd3rW/DOSYSV5AYa+SUqXY= X-Google-Smtp-Source: ABdhPJw7FqB21DkxtB4Dwl1/AQaoyUEEmYc6JMVQdWN1hbxeU0g/SfubjYI7tmI9KdoCqDUAXsnHPMuPT9hi18llQ0c= X-Received: by 2002:ab0:2852:: with SMTP id c18mr3834729uaq.132.1591279869199; Thu, 04 Jun 2020 07:11:09 -0700 (PDT) MIME-Version: 1.0 From: Andrew Chanler Date: Thu, 4 Jun 2020 10:10:58 -0400 Message-ID: Subject: kernel panic loop after zpool remove To: freebsd-fs@freebsd.org X-Rspamd-Queue-Id: 49d73y6Fmwz3f49 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=fevRjqeg; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of achanler@gmail.com designates 2607:f8b0:4864:20::92c as permitted sender) smtp.mailfrom=achanler@gmail.com X-Spamd-Result: default: False [-3.23 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.94)[-0.935]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-0.98)[-0.985]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::92c:from]; NEURAL_HAM_SHORT(-0.31)[-0.309]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jun 2020 14:11:11 -0000 Hi, Is there anyway to get zfs and the zpools into a recovery 'safe-mode' where it stops trying to do any manipulations of the zpools? I'm stuck in a kernel panic loop after 'zpool remove'. With the recent discussions about openzfs I was also considering trying to switch to openzfs but don't want to get myself into more trouble! I just need to stop the remove vdev operation and this 'metaslab free' code flow from running on the new vdev so I can keep the system up long enough to read the data out. Thank you, Andrew The system is running 'FreeBSD 12.1-RELEASE-p5 GENERIC amd64'. ------- panic: solaris assert: ((offset) & ((1ULL << vd->vdev_ashift) - 1)) == 0 (0x400 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c, line: 3593 cpuid = 1 time = 1590769420 KDB: stack backtrace: #0 0xffffffff80c1d307 at kdb_backtrace+0x67 #1 0xffffffff80bd063d at vpanic+0x19d #2 0xffffffff80bd0493 at panic+0x43 #3 0xffffffff82a6922c at assfail3+0x2c #4 0xffffffff828a3b83 at metaslab_free_concrete+0x103 #5 0xffffffff828a4dd8 at metaslab_free+0x128 #6 0xffffffff8290217c at zio_dva_free+0x1c #7 0xffffffff828feb7c at zio_execute+0xac #8 0xffffffff80c2fae4 at taskqueue_run_locked+0x154 #9 0xffffffff80c30e18 at taskqueue_thread_loop+0x98 #10 0xffffffff80b90c53 at fork_exit+0x83 #11 0xffffffff81082c2e at fork_trampoline+0xe Uptime: 7s ------- I have a core dump and can provide more details to help debug the issue and open a bug tracker too: ------- (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff80bd0238 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bd0699 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:877 #4 0xffffffff80bd0493 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:804 #5 0xffffffff82a6922c in assfail3 (a=, lv=, op=, rv=, f=, l=) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000, offset=137438954496, asize=, checkpoint=0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593 #7 0xffffffff828a4dd8 in metaslab_free_dva (spa=, checkpoint=0, dva=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3863 #8 metaslab_free (spa=, bp=0xfffff800043788a0, txg=41924766, now=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145 #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070 #10 0xffffffff828feb7c in zio_execute (zio=0xfffff80004378830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786 #11 0xffffffff80c2fae4 in taskqueue_run_locked (queue=0xfffff80004222800) at /usr/src/sys/kern/subr_taskqueue.c:467 #12 0xffffffff80c30e18 in taskqueue_thread_loop (arg=) at /usr/src/sys/kern/subr_taskqueue.c:773 #13 0xffffffff80b90c53 in fork_exit (callout=0xffffffff80c30d80 , arg=0xfffff800041d90b0, frame=0xfffffe004dcf9bc0) at /usr/src/sys/kern/kern_fork.c:1065 #14 (kgdb) frame 6 #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000, offset=137438954496, asize=, checkpoint=0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593 3593 VERIFY0(P2PHASE(offset, 1ULL << vd->vdev_ashift)); (kgdb) p /x offset $1 = 0x2000000400 (kgdb) p /x vd->vdev_ashift $2 = 0xc (kgdb) frame 9 #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070 3070 metaslab_free(zio->io_spa, zio->io_bp, zio->io_txg, B_FALSE); (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_removing $3 = 0x1 (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_ashift $4 = 0x9 (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[1]->vdev_ashift $5 = 0xc (kgdb) frame 8 #8 metaslab_free (spa=, bp=0xfffff800043788a0, txg=41924766, now=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145 4145 metaslab_free_dva(spa, &dva[d], checkpoint); (kgdb) p /x *bp $6 = {blk_dva = {{dva_word = {0x100000002, 0x10000002}}, {dva_word = {0x0, 0x0}}, {dva_word = {0x0, 0x0}}}, blk_prop = 0x8000020200010001, blk_pad = {0x0, 0x0}, blk_phys_birth = 0x0, blk_birth = 0x4, blk_fill = 0x0, blk_cksum = {zc_word = {0x0, 0x0, 0x0, 0x0}}} ------- Some of the notes on how I got in this state from the forum post ( https://forums.freebsd.org/threads/kernel-panic-loop-after-zpool-remove.75632/ ): I had two 2TB drives in mirror configuration for the last 7 years upgrading FreeBSD and zfs as time went by. I finally needed more storage and tried to add two 4TB drives as a second vdev mirror: 'zpool add storage mirror /dev/ada3 /dev/ada4' Next the 'zpool status' showed: --- pool: storage state: ONLINE status: One or more devices are configured to use a non-native block size. Expect reduced performance. action: Replace affected devices with devices that support the configured block size, or migrate data to a properly configured pool. scan: scrub repaired 0 in 0 days 08:39:28 with 0 errors on Sat May 9 01:19:54 2020 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 block size: 512B configured, 4096B native ada2 ONLINE 0 0 0 block size: 512B configured, 4096B native mirror-1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 errors: No known data errors --- I should have stopped there, but saw the block size warning and thought I would try to fix it. The zdb showed mirror-0 with ashift 9 (512 byte alignment) and mirror-1 with ashift 12 (4096 byte alignment). I issued 'zpool remove storage mirror-0' and quickly went into a panic reboot loop. Rebooting into single user mode, first zfs or zpool command loads the driver and it panics again. Powering off the new drives and rebooting it does not panic, but it fails to because the zpool is missing a top-level vdev (mirror-1).