Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Feb 2022 13:23:40 -0500
From:      Rich <rincebrain@gmail.com>
To:        John F Carr <jfc@mit.edu>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Repairing a bad ZFS free list
Message-ID:  <CAOeNLup_tf_BFYV6Nrrbr2WyTaC9n0dAekOt8PCSUjW6xS0oaA@mail.gmail.com>
In-Reply-To: <84C3247E-B5F0-4572-AE38-3B530D61CB1C@exchange.mit.edu>
References:  <84C3247E-B5F0-4572-AE38-3B530D61CB1C@exchange.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000c5ea9b05d75d95db
Content-Type: text/plain; charset="UTF-8"

https://github.com/openzfs/zfs/issues/11480 seems germane.

I'm not 100% certain from reading the fix, but it seems like applying the
patch should result in no longer panicking.

- Rich

On Sun, Feb 6, 2022 at 1:10 PM John F Carr <jfc@mit.edu> wrote:

> I have a corrupt root ZFS pool on my ARM server (Ampere eMAG) running
> a recent version of stable/13.  Is there any way to repair my system
> short of wiping the disk and reinstalling?
>
> All filesystems mount and there are no errors reported by zpool, but
> there is bad metadata, apparently a block having been allocated twice.
> Running "zfs destroy" tends to cause crashes like
>
> panic: VERIFY3(l->blk_birth == r->blk_birth) failed (9269896 == 9269889)
>
> The assertion is in dsl_deadlist.c:livelist_compare().  There are two
> livelist_entry_t objects containing blkptr_t objects with the same
> DVA_GET_VDEV and DVA_GET_OFFSET but distinct blk_birth.  Apparently
> this is a bad thing.
>
> spa_livelist_delete_cb appears in the stack trace.  I think the kernel is
> telling
> me the same block has been allocated twice and it doesn't want to free it
> twice.
>
> This problem persists across reboot.  Since I want to use poudriere
> "stop running zfs destroy" is not a good workaround.
>
> Is it safe to disable the assertion, or will that spread the
> corruption even further?
>
> In the old days I would use clri or fsdb to make the problematic part
> of a UFS filesystem go away.  How do I repair ZFS?
>
> This crash has been reported as bug
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538
>
>
>

--000000000000c5ea9b05d75d95db
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><a href=3D"https://github.com/openzfs/zfs/issues/11480">ht=
tps://github.com/openzfs/zfs/issues/11480</a> seems germane.<div><br></div>=
<div>I&#39;m not 100% certain from reading the fix, but it seems like apply=
ing the patch should result in no longer panicking.</div><div><br></div><di=
v>- Rich</div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=
=3D"gmail_attr">On Sun, Feb 6, 2022 at 1:10 PM John F Carr &lt;<a href=3D"m=
ailto:jfc@mit.edu">jfc@mit.edu</a>&gt; wrote:<br></div><blockquote class=3D=
"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2=
04,204,204);padding-left:1ex">I have a corrupt root ZFS pool on my ARM serv=
er (Ampere eMAG) running<br>
a recent version of stable/13.=C2=A0 Is there any way to repair my system<b=
r>
short of wiping the disk and reinstalling?<br>
<br>
All filesystems mount and there are no errors reported by zpool, but<br>
there is bad metadata, apparently a block having been allocated twice.<br>
Running &quot;zfs destroy&quot; tends to cause crashes like<br>
<br>
panic: VERIFY3(l-&gt;blk_birth =3D=3D r-&gt;blk_birth) failed (9269896 =3D=
=3D 9269889)<br>
<br>
The assertion is in dsl_deadlist.c:livelist_compare().=C2=A0 There are two<=
br>
livelist_entry_t objects containing blkptr_t objects with the same<br>
DVA_GET_VDEV and DVA_GET_OFFSET but distinct blk_birth.=C2=A0 Apparently<br=
>
this is a bad thing.<br>
<br>
spa_livelist_delete_cb appears in the stack trace.=C2=A0 I think the kernel=
 is telling<br>
me the same block has been allocated twice and it doesn&#39;t want to free =
it twice.<br>
<br>
This problem persists across reboot.=C2=A0 Since I want to use poudriere<br=
>
&quot;stop running zfs destroy&quot; is not a good workaround.<br>
<br>
Is it safe to disable the assertion, or will that spread the<br>
corruption even further?<br>
<br>
In the old days I would use clri or fsdb to make the problematic part<br>
of a UFS filesystem go away.=C2=A0 How do I repair ZFS?<br>
<br>
This crash has been reported as bug<br>
<a href=3D"https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261538" rel=
=3D"noreferrer" target=3D"_blank">https://bugs.freebsd.org/bugzilla/show_bu=
g.cgi?id=3D261538</a><br>
<br>
<br>
</blockquote></div>

--000000000000c5ea9b05d75d95db--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOeNLup_tf_BFYV6Nrrbr2WyTaC9n0dAekOt8PCSUjW6xS0oaA>