From owner-freebsd-fs@freebsd.org Thu Jun 18 17:55:34 2020 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 79FEA3567C6 for ; Thu, 18 Jun 2020 17:55:34 +0000 (UTC) (envelope-from achanler@gmail.com) Received: from mail-ua1-x936.google.com (mail-ua1-x936.google.com [IPv6:2607:f8b0:4864:20::936]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49nqNP6fR0z3XKX for ; Thu, 18 Jun 2020 17:55:33 +0000 (UTC) (envelope-from achanler@gmail.com) Received: by mail-ua1-x936.google.com with SMTP id v25so2299215uau.4 for ; Thu, 18 Jun 2020 10:55:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=wUofzWBlWmXBDUQ/ueDO9b8Q3e+iIM78fU8xFqcmrLQ=; b=X5tsRhohvmKLgvfX9oZ6IHbqjggCicU1poWbYbrIr/LRNqfQGeXeWA036EynUip5MZ IXT+H6HSFytF2P7Pmw9iHgG0soKJKQ20c+1wvAf/r2IGv0ulPI/kPAi3axR6axQa1Nd9 LPLk3z2r/TbHQHhzbKAWnUuNvV9ioD8jlofhTPuwowXCGbLEAXqbWTyAZKas3re5+Vur raH12cKBl/Gwc+Lxmu0CjNlHecOnfElQSRLDYFm6EzkFisf8qdj2OGk8c9jAWxhytcLh ldNQVdJ9PMKvF1SygEw+uTGJfjLXxw5uxFvlye1LklL+FuSIEnWBDC5iiOz3LIlgwqOc C2ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=wUofzWBlWmXBDUQ/ueDO9b8Q3e+iIM78fU8xFqcmrLQ=; b=ptChR4ko46qI/mvWThkbApgVpKTdMpcI38elZJ2Ph2XLjojft/1rbK83dotGJovBEs t+z0SGjr3aUJw/eX1KtCmUu76LUlMJVpu5QZD4nv+bjTVe9xEU8ZDQ9bWYnzmtxqdntH gnovC7zGBuOSmJqpcnfxn9R4GcAc9ocDxd4yohT0b24yvJNpE0aJmHgVFsjrRa0FWdbl /bMN2x+nuXiw5mPpfSSaeD7UAPTH57iAiF6va5X3ltyD+1iIwfJUM7aJhNY1Vjlbu3dd pX+zCyEfdz7uyHZKG0gQ0D3D1bSZUW+QleTk8YqOFQVK6ds5CWKnRZoDqDASOVna4ano x1Iw== X-Gm-Message-State: AOAM532Uev5qyGgtaTl7JIM5ioXKLI5KyOMMzbE2DbY8JEePUi8SaTy7 yVgAIdMWbYN/px35EH8UiurV6pt896OCSdWmM65EDn+0pXA= X-Google-Smtp-Source: ABdhPJxeq4ARYXE4VpHinKboSwNSmN6maugYJrqKgRMgSDuhJ9F3tChtkPIHhuBYm3LEFTKG5alFDuAd6Z5ZXAiS7bc= X-Received: by 2002:ab0:60c2:: with SMTP id g2mr4324669uam.36.1592502932513; Thu, 18 Jun 2020 10:55:32 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrew Chanler Date: Thu, 18 Jun 2020 13:55:20 -0400 Message-ID: Subject: Re: kernel panic loop after zpool remove To: freebsd-fs@freebsd.org X-Rspamd-Queue-Id: 49nqNP6fR0z3XKX X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=X5tsRhoh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of achanler@gmail.com designates 2607:f8b0:4864:20::936 as permitted sender) smtp.mailfrom=achanler@gmail.com X-Spamd-Result: default: False [-2.92 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.96)[-0.962]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.003]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::936:from]; NEURAL_SPAM_SHORT(0.05)[0.047]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2020 17:55:34 -0000 I managed to recover the data from this zpool. You can read about it on the forum if you are curious: https://forums.FreeBSD.org/threads/kernel-panic-loop-after-zpool-remove.75632/post-466625 Andrew On Thu, Jun 4, 2020 at 10:10 AM Andrew Chanler wrote: > Hi, > > Is there anyway to get zfs and the zpools into a recovery 'safe-mode' > where it stops trying to do any manipulations of the zpools? I'm stuck in > a kernel panic loop after 'zpool remove'. With the recent discussions > about openzfs I was also considering trying to switch to openzfs but don't > want to get myself into more trouble! I just need to stop the remove vdev > operation and this 'metaslab free' code flow from running on the new vdev > so I can keep the system up long enough to read the data out. > > Thank you, > Andrew > > > The system is running 'FreeBSD 12.1-RELEASE-p5 GENERIC amd64'. > ------- > panic: solaris assert: ((offset) & ((1ULL << vd->vdev_ashift) - 1)) == 0 > (0x400 == 0x0), file: > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c, line: > 3593 > cpuid = 1 > time = 1590769420 > KDB: stack backtrace: > #0 0xffffffff80c1d307 at kdb_backtrace+0x67 > #1 0xffffffff80bd063d at vpanic+0x19d > #2 0xffffffff80bd0493 at panic+0x43 > #3 0xffffffff82a6922c at assfail3+0x2c > #4 0xffffffff828a3b83 at metaslab_free_concrete+0x103 > #5 0xffffffff828a4dd8 at metaslab_free+0x128 > #6 0xffffffff8290217c at zio_dva_free+0x1c > #7 0xffffffff828feb7c at zio_execute+0xac > #8 0xffffffff80c2fae4 at taskqueue_run_locked+0x154 > #9 0xffffffff80c30e18 at taskqueue_thread_loop+0x98 > #10 0xffffffff80b90c53 at fork_exit+0x83 > #11 0xffffffff81082c2e at fork_trampoline+0xe > Uptime: 7s > ------- > > I have a core dump and can provide more details to help debug the issue > and open a bug tracker too: > ------- > (kgdb) bt > #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 > #1 doadump (textdump=) at > /usr/src/sys/kern/kern_shutdown.c:371 > #2 0xffffffff80bd0238 in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:451 > #3 0xffffffff80bd0699 in vpanic (fmt=, ap=) > at /usr/src/sys/kern/kern_shutdown.c:877 > #4 0xffffffff80bd0493 in panic (fmt=) at > /usr/src/sys/kern/kern_shutdown.c:804 > #5 0xffffffff82a6922c in assfail3 (a=, lv=, > op=, rv=, f=, l=) > at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 > #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000, > offset=137438954496, asize=, checkpoint=0) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593 > #7 0xffffffff828a4dd8 in metaslab_free_dva (spa=, > checkpoint=0, dva=) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3863 > #8 metaslab_free (spa=, bp=0xfffff800043788a0, > txg=41924766, now=) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145 > #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070 > #10 0xffffffff828feb7c in zio_execute (zio=0xfffff80004378830) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786 > #11 0xffffffff80c2fae4 in taskqueue_run_locked (queue=0xfffff80004222800) > at /usr/src/sys/kern/subr_taskqueue.c:467 > #12 0xffffffff80c30e18 in taskqueue_thread_loop (arg=) at > /usr/src/sys/kern/subr_taskqueue.c:773 > #13 0xffffffff80b90c53 in fork_exit (callout=0xffffffff80c30d80 > , arg=0xfffff800041d90b0, frame=0xfffffe004dcf9bc0) > at /usr/src/sys/kern/kern_fork.c:1065 > #14 > (kgdb) frame 6 > #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000, > offset=137438954496, asize=, checkpoint=0) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593 > 3593 VERIFY0(P2PHASE(offset, 1ULL << vd->vdev_ashift)); > (kgdb) p /x offset > $1 = 0x2000000400 > (kgdb) p /x vd->vdev_ashift > $2 = 0xc > (kgdb) frame 9 > #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070 > 3070 metaslab_free(zio->io_spa, zio->io_bp, zio->io_txg, > B_FALSE); > (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_removing > $3 = 0x1 > (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_ashift > $4 = 0x9 > (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[1]->vdev_ashift > $5 = 0xc > (kgdb) frame 8 > #8 metaslab_free (spa=, bp=0xfffff800043788a0, > txg=41924766, now=) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145 > 4145 metaslab_free_dva(spa, &dva[d], > checkpoint); > (kgdb) p /x *bp > $6 = {blk_dva = {{dva_word = {0x100000002, 0x10000002}}, {dva_word = {0x0, > 0x0}}, {dva_word = {0x0, 0x0}}}, blk_prop = 0x8000020200010001, blk_pad = > {0x0, > 0x0}, blk_phys_birth = 0x0, blk_birth = 0x4, blk_fill = 0x0, blk_cksum > = {zc_word = {0x0, 0x0, 0x0, 0x0}}} > ------- > > Some of the notes on how I got in this state from the forum post ( > https://forums.freebsd.org/threads/kernel-panic-loop-after-zpool-remove.75632/ > ): > > I had two 2TB drives in mirror configuration for the last 7 years > upgrading FreeBSD and zfs as time went by. I finally needed more storage > and tried to add two 4TB drives as a second vdev mirror: > 'zpool add storage mirror /dev/ada3 /dev/ada4' > > Next the 'zpool status' showed: > --- > pool: storage > state: ONLINE > status: One or more devices are configured to use a non-native block size. > Expect reduced performance. > action: Replace affected devices with devices that support the > configured block size, or migrate data to a properly configured > pool. > scan: scrub repaired 0 in 0 days 08:39:28 with 0 errors on Sat May 9 > 01:19:54 2020 > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 block > size: 512B configured, 4096B native > ada2 ONLINE 0 0 0 block > size: 512B configured, 4096B native > mirror-1 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > > errors: No known data errors > --- > > I should have stopped there, but saw the block size warning and thought I > would try to fix it. The zdb showed mirror-0 with ashift 9 (512 byte > alignment) and mirror-1 with ashift 12 (4096 byte alignment). I issued > 'zpool remove storage mirror-0' and quickly went into a panic reboot loop. > Rebooting into single user mode, first zfs or zpool command loads the > driver and it panics again. Powering off the new drives and rebooting it > does not panic, but it fails to because the zpool is missing a top-level > vdev (mirror-1). > > >