Date: Mon, 11 Feb 2019 02:19:18 -0800 From: Mel Pilgrim <list_freebsd@bluerosetech.com> To: Karl Denninger <karl@denninger.net>, freebsd-stable@freebsd.org Subject: Re: Serious ZFS Bootcode Problem (GPT NON-UEFI) Message-ID: <d627f47e-bc19-acf5-78b0-b0e5c3383ade@bluerosetech.com> In-Reply-To: <911d001f-9e33-0521-51fe-f7d1383dfc62@denninger.net> References: <911d001f-9e33-0521-51fe-f7d1383dfc62@denninger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 02/09/2019 14:30, Karl Denninger wrote: > FreeBSD 12.0-STABLE r343809 > > After upgrading to this (without material incident) zfs was telling me > that the pools could be upgraded (this machine was running 11.1, then 11.2.) > > I did so, /and put the new bootcode on with gpart bootcode -b /boot/pmbr > -p /boot/gptzfsboot -i .... da... /on both of the candidate (mirrored > ZFS boot disk) devices, in the correct partition. > > Then I rebooted to test and..... /could not find the zsboot pool > containing the kernel./ > > I booted the rescue image off my SD and checked -- the copy of > gptzfsboot that I put on the boot partition is exactly identical to the > one on the rescue image SD. > > Then, to be /absolutely sure /I wasn't going insane I grabbed the > mini-memstick img for 12-RELEASE and tried THAT copy of gptzfsboot. > > /Nope; that won't boot either!/ > > Fortunately I had a spare drive slot so I stuck in a piece of spinning > rust, gpart'ed THAT with an old-style UFS boot filesystem, wrote > bootcode on that, mounted the ZFS "zsboot" filesystem and copied it > over. That boots fine (of course) and mounts the root pool, and off it > goes. > > I'm going to blow away the entire /usr/obj tree and rebuild the kernel > to see if that gets me anything that's more-sane, but right now this > looks pretty bad. > > BTW just to be absolutely sure I blew away the entire /usr/obj directory > and rebuilt -- same size and checksum on the binary that I have > installed, so..... > > Not sure what's going on here -- did something get moved? I smashed my head against the wall for days with a very similar-sounding problem: pure ZFS with a GELI root and separate /boot pool that would not import the /boot pool at boot, resulting in the kernel not having the keys to attach the GELI+ZFS root. That configuration needs some extra bits in loader.conf so that zpool.cache and the GELI keys get loaded for the kernel by the loader. This loads the zpool.cache into the kernel so it imports everything before /etc/rc.d/zfs can run (the case where you have a ZFS /boot that isn't imported after a reboot: zpool_cache_load="YES" zpool_cache_name="/boot/zfs/zpool.cache" zpool_cache_type="/boot/zfs/zpool.cache" Run geli init with -b so the providers are flagged for attachment at boot (instead of by /etc/rc.d/geli), then add this for every GELI provider you want the kernel to attach before starting the userland: geli_FOO_keyfile0_load="YES" geli_FOO_keyfile0_name="/boot/path/to/key" geli_FOO_keyfile0_type="devicename:geli_keyfile0" FOO can be any alphanumeric string, and needs to be consistent for all three lines and unique per device. The "devicename" is gpt/BAR for a device with a GPT label of BAR. It can also be the unlabeled device (e.g., da0p3), but using GPT labels is recommended because it makes the keys follow a device renumber. For example, my GELI+ZFS root is a mirror of partitions with nvmezfs0 and nvmezfs1 GPT labels, so I have in my loader.conf: geli_nvmezfs0_keyfile0_load="YES" geli_nvmezfs0_keyfile0_name="/boot/gelikeys/nvmezfs0.key" geli_nvmezfs0_keyfile0_type="gpt/nvmezfs0:geli_keyfile0" geli_nvmezfs1_keyfile0_load="YES" geli_nvmezfs1_keyfile0_name="/boot/gelikeys/nvmezfs1.key" geli_nvmezfs1_keyfile0_type="gpt/nvmezfs1:geli_keyfile0" If you use GPT labels, you can safely ignore the "GEOM_ELI: Found no key files in loader.conf for DEVICE" messages where DEVICE is the unlabeled device--the GELI module doesn't currently recognize that the unlabeled and labeled devices are the same provider. This doesn't appear to be documented in the Handbook or any man pages that I could find. The zpool_cache_load trick is mentioned in a FreeBSD wiki page[1], and the geli_* config is pulled from the zfsboot script used by bsdinstall to install a pure-ZFS system with GELI root. I'm not sure if this is exactly your problem, but maybe it helps? 1: https://wiki.freebsd.org/MasonLoringBliss/UEFIandZFSandGELIbyHAND
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d627f47e-bc19-acf5-78b0-b0e5c3383ade>