Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Jun 2023 12:30:28 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 271989] zfs root mount error 6 after upgrade from 11.1-release to 13.2-release
Message-ID:  <bug-271989-227-NcU14ceZKL@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-271989-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-271989-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D271989

--- Comment #9 from Markus Wild <freebsd-bugs@virtualtec.ch> ---
(In reply to Dan Langille from comment #8)
there you go, the bogus=20

pool_guid: 18320603570228782289

is what causes your kernel to fail to load the pool, since it shows up in
your console messages as mismatched comparisons against the vdevs the kernel
found.
This is most likely -as with my installation- the result of originally
installing=20
the zpool on the entire disk, and then later removing that pool and reducing
the=20
zfs partition and recreating the pool. From what I reverse engineered, a zp=
ool
seems to put 2 labels at the beginning of its assigned disk space and
2 labels at the end, most likely in an effort to be able to restore those
labels should someone/something accidentally overwrite them.

The stupidity of the whole thing is: the kernel code to load the zfs root
filesystem seems to first scan the "entire disk device" for these 4 labels,=
 and
if it finds any, will insist in using them and NOT consider any valid labels
of partitions in the GPT partition table. zpool import doesn't do this, it's
just the mount code in the kernel.

There is a "zpool labelclear" command which is supposed to clear these
wrong old labels, but I personally didn't trust it to not go ahead and=20
clear the labels of ALL zfs instances on the disk if you let it loose on the
entire disk device. The man page is not very clear in this respect, and
searching=20
for this shows I was not the only one confused on the exact behavior of tha=
t=20
command.

What I did in my case is:
- use gpart to add an additional temporary swap partition to fill the disk:
  gpart add -t freebsd-swap nvd0
- this resulted in a nvd0p5 in my case
- then I did
  dd if=3D/dev/zero of=3D/dev/nvd0p5 bs=3D1024M
  to clear that temp partition, and thus the end of the disk from the old=20
  zpool label
- remove the temp partition again:
  gpart delete -i 5 nvd0
if you check the device again after this (zdb -l), it shouldn't find any
labels anymore.

What I'd expect for the future, and why I didn't ask for this bug report=20
to be closed after I fixed my problem:
- kernel mount code should first check all valid zfs partitions for
  labels
- only if no labels are found in valid partitions should it also consider t=
he
  entire disk device (nvd0, ada0, etc) to cover the cases where people defi=
ne
  a zpool like "mirror /dev/ada0 /dev/ada1". I know this works for data poo=
ls,
  but I'm not sure you could actually boot from such a pool.

Cheers,
Markus

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-271989-227-NcU14ceZKL>