Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Feb 2019 02:19:18 -0800
From:      Mel Pilgrim <list_freebsd@bluerosetech.com>
To:        Karl Denninger <karl@denninger.net>, freebsd-stable@freebsd.org
Subject:   Re: Serious ZFS Bootcode Problem (GPT NON-UEFI)
Message-ID:  <d627f47e-bc19-acf5-78b0-b0e5c3383ade@bluerosetech.com>
In-Reply-To: <911d001f-9e33-0521-51fe-f7d1383dfc62@denninger.net>
References:  <911d001f-9e33-0521-51fe-f7d1383dfc62@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 02/09/2019 14:30, Karl Denninger wrote:
> FreeBSD 12.0-STABLE r343809
> 
> After upgrading to this (without material incident) zfs was telling me
> that the pools could be upgraded (this machine was running 11.1, then 11.2.)
> 
> I did so, /and put the new bootcode on with gpart bootcode -b /boot/pmbr
> -p /boot/gptzfsboot -i .... da... /on both of the candidate (mirrored
> ZFS boot disk) devices, in the correct partition.
> 
> Then I rebooted to test and..... /could not find the zsboot pool
> containing the kernel./
> 
> I booted the rescue image off my SD and checked -- the copy of
> gptzfsboot that I put on the boot partition is exactly identical to the
> one on the rescue image SD.
> 
> Then, to be /absolutely sure /I wasn't going insane I grabbed the
> mini-memstick img for 12-RELEASE and tried THAT copy of gptzfsboot.
> 
> /Nope; that won't boot either!/
> 
> Fortunately I had a spare drive slot so I stuck in a piece of spinning
> rust, gpart'ed THAT with an old-style UFS boot filesystem, wrote
> bootcode on that, mounted the ZFS "zsboot" filesystem and copied it
> over.  That boots fine (of course) and mounts the root pool, and off it
> goes.
> 
> I'm going to blow away the entire /usr/obj tree and rebuild the kernel
> to see if that gets me anything that's more-sane, but right now this
> looks pretty bad.
> 
> BTW just to be absolutely sure I blew away the entire /usr/obj directory
> and rebuilt -- same size and checksum on the binary that I have
> installed, so.....
> 
> Not sure what's going on here -- did something get moved?

I smashed my head against the wall for days with a very similar-sounding 
problem: pure ZFS with a GELI root and separate /boot pool that would 
not import the /boot pool at boot, resulting in the kernel not having 
the keys to attach the GELI+ZFS root.

That configuration needs some extra bits in loader.conf so that 
zpool.cache and the GELI keys get loaded for the kernel by the loader.

This loads the zpool.cache into the kernel so it imports everything 
before /etc/rc.d/zfs can run (the case where you have a ZFS /boot that 
isn't imported after a reboot:

zpool_cache_load="YES"
zpool_cache_name="/boot/zfs/zpool.cache"
zpool_cache_type="/boot/zfs/zpool.cache"

Run geli init with -b so the providers are flagged for attachment at 
boot (instead of by /etc/rc.d/geli), then add this for every GELI 
provider you want the kernel to attach before starting the userland:

geli_FOO_keyfile0_load="YES"
geli_FOO_keyfile0_name="/boot/path/to/key"
geli_FOO_keyfile0_type="devicename:geli_keyfile0"

FOO can be any alphanumeric string, and needs to be consistent for all 
three lines and unique per device.  The "devicename" is gpt/BAR for a 
device with a GPT label of BAR.  It can also be the unlabeled device 
(e.g., da0p3), but using GPT labels is recommended because it makes the 
keys follow a device renumber.

For example, my GELI+ZFS root is a mirror of partitions with nvmezfs0 
and nvmezfs1 GPT labels, so I have in my loader.conf:

geli_nvmezfs0_keyfile0_load="YES"
geli_nvmezfs0_keyfile0_name="/boot/gelikeys/nvmezfs0.key"
geli_nvmezfs0_keyfile0_type="gpt/nvmezfs0:geli_keyfile0"
geli_nvmezfs1_keyfile0_load="YES"
geli_nvmezfs1_keyfile0_name="/boot/gelikeys/nvmezfs1.key"
geli_nvmezfs1_keyfile0_type="gpt/nvmezfs1:geli_keyfile0"

If you use GPT labels, you can safely ignore the "GEOM_ELI: Found no key 
files in loader.conf for DEVICE" messages where DEVICE is the unlabeled 
device--the GELI module doesn't currently recognize that the unlabeled 
and labeled devices are the same provider.

This doesn't appear to be documented in the Handbook or any man pages 
that I could find.  The zpool_cache_load trick is mentioned in a FreeBSD 
wiki page[1], and the geli_* config is pulled from the zfsboot script 
used by bsdinstall to install a pure-ZFS system with GELI root.

I'm not sure if this is exactly your problem, but maybe it helps?

1: https://wiki.freebsd.org/MasonLoringBliss/UEFIandZFSandGELIbyHAND



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d627f47e-bc19-acf5-78b0-b0e5c3383ade>