Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Dec 2017 22:49:52 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Eric McCorkle <eric@metricspace.net>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, Warner Losh <imp@freebsd.org>, Allan Jude <allanjude@freebsd.org>
Subject:   Re: loader.efi architecture for replacing boot1.efi
Message-ID:  <CANCZdfo3Q_z1%2BO=VqUNcvPnyL%2BMFDnzzv5GWKbPdu8O-ZsQPyQ@mail.gmail.com>
In-Reply-To: <ef64c1b9-024d-dce9-d620-c47ab7921fd6@metricspace.net>
References:  <1fa7edde-6ac0-1d4f-e75a-503b23a5d4dc@metricspace.net> <CANCZdfpJm9MjxvO4dPy7qZ4jjot44yAMj7NhaY_MQ5z7WVbd9A@mail.gmail.com> <46af04dd-8f74-b9dc-3d3a-343f022129ed@metricspace.net> <CANCZdfrpi3JTDxo17RBiLdZ=UjdPF3FgpqwmBepZ=8k5-P0F2g@mail.gmail.com> <CANCZdfr0=WzVkUb85o2aUT3eA7EAAx4MCnQy6gk8XdeJvb9tsA@mail.gmail.com> <ef64c1b9-024d-dce9-d620-c47ab7921fd6@metricspace.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 15, 2017 at 9:54 PM, Eric McCorkle <eric@metricspace.net> wrote:

>
>
> On 12/15/2017 22:28, Warner Losh wrote:
> >
> >
> > On Fri, Dec 15, 2017 at 7:05 PM, Warner Losh <imp@bsdimp.com
> > <mailto:imp@bsdimp.com>> wrote:
> >
> >
> >
> >     On Dec 15, 2017 6:43 PM, "Eric McCorkle" <eric@metricspace.net
> >     <mailto:eric@metricspace.net>> wrote:
> >
> >         On 12/15/2017 20:09, Warner Losh wrote:
> >
> >         > This should be second. Uefi variables Trump all.
> >         >
> >         >     2) If not, then attempt to read EFI vars to determine the
> boot location
> >         >
> >         >     3) If no EFI vars are defined, and no partition was
> specified, fall back
> >         >     to looking for an installed system on devices
> >         >
> >         >
> >         > This is fine, so long as it is only on the device that the
> loader loaded
> >         > from.
> >
> >         It's fine if it's configurable, but there needs to be sane
> >         behavior if
> >         the EFI vars aren't set.
> >
> >
> >     Where do we get this info for such a broken setup? Do you have
> >     actual examples?
> >
> >         >     4) At the very last, do the legacy (what loader.efi
> currently does)
> >         >     behavior.
> >         >
> >         >
> >         > This is bogus. It violates the uefi boot loader protocol. We
> must
> >         > abandon this legacy behavior. The behavior is actively harmful
> since
> >         > something random will boot. This has caused actual operational
> issues at
> >         > Netflix. Guessing is really bad.
> >
> >         We can't just ditch the current behavior and break everyone's
> >         existing
> >         install, though.  Legacy behavior should be supported at least
> >         until the
> >         next major release.
> >
> >
> >     What useful setups does this break? Absent a real example, we
> >     absolutely are breaking this. There is a real cost to doing this
> >     that as the de facto maintainer of stand I'm unwilling to maintain,
> >     test or commit to not breaking. The legacy behavior is broken and
> >     has caused me hours of pain in production. There has been no
> >     articulated use case this enables, especially since boot loader can
> >     be interrupted to specify something in recovery scenarios.
> >
> >
> >         >
> >         >     Step (3) is done by attempting to stat /boot/loader.conf
> and
> >         >     /boot/kernel.  First, all partitions on the same disk are
> >         searched, then
> >         >     all remaining partitions are searched.
> >         >
> >         >     This should allow mechanisms like EFI vars and
> >         command-line args to work
> >         >     without interference from the fallback mechanisms.
> >         However, it also
> >         >     provides robustness in the face of failure modes and
> >         uninitialized
> >         >     systems (I personally ran into a problem a while back with
> >         a linux
> >         >     system, where I couldn't boot with EFI, because the EFI
> >         vars weren't
> >         >     set, because I couldn't set them if I couldn't boot with
> >         EFI; had to use
> >         >     Shell.efi to sort out the mess...)
> >         >
> >         >     More importantly, it provides a seamless transition from
> >         the way things
> >         >     are now to the way we want things to be.
> >         >
> >         >     Please provide comments and feedback.
> >         >
> >         >
> >         > Please listen when I say searching all devices is actively
> >         harmful. The
> >         > uefi boot manager, which I'm in the process of bringing in,
> >         offers a way
> >         > to specifically say what you want to boot. If someone needs
> >         something
> >         > complicated, they must use that moving forward. Part of what
> >         makes the
> >         > protocol work is loaders giving up early so the next one on
> >         the list can
> >         > be tried.
> >
> >         We also have to deal with the reality that some EFI
> >         implementations are
> >         adversarial.  We have to be able to deal with implementations
> >         that make
> >         it difficult to set EFI vars, or which mess with their values
> >         (Lenovo is
> >         particularly notorious for this).
> >
> >         You can disable fallback mechanisms with command-line args or
> >         macros or
> >         whatever, but they need to be there.
> >
> >
> >     No. Absent a sane use case, I refuse. Give me a reasonable use case,
> >     I will reconsider.
> >
> >
> > So the current behavior leads to absurd results that nobody else does,
> > and that we don't do for legacy boot:
> >
> > If we boot loader.efi/boot1.efi off a hard drive, and find there's no
> > kernel, we'll load off cdrom or a floppy if we happen to find a kernel
> > there. That's nuts. What's more, we'll load off a different device (say
> > a thumb drive), which is also crazy. The last thing you want is to
> > accidentally pick the thumb drive recovery kernel that happens to be in
> > a USB slot when you have a primary and secondary partition on two main
> > disks, but today's behavior chooses that. It's so crazy that I can see
> > no benefit from supporting, testing and maintaining this. If someone
> > wants to recover a system, they can do it at the boot loader prompt now
> > (they couldn't before). If someone really wants to boot his crazy thing,
> > we have a new way to specify it specifically w/o any ambiguity based on
> > how the devices might move around.
> >
> > We already support about 100 boot scenarios that are hard enough to
> > test. I don't want to commit to supporting this and making it 120 or 150
> > once you work out all the combinatorics. We have to trim the matrix of
> > useless things.  So absent a use case that makes sense, that people are
> > actually doing, I'm having a hard time justifying keeping it around as
> > we transition.
> >
> > Warner
> >
> > P.S. On x86, we support geli/nogeli, gpt/mbr, ufs/zfs, and
> > uefi/legacy/both (24 combinations). Plus we support booting off CDROM,
> > netbooting, etc. For arm, and arm64 we have a similar number that are
> > possible. zfs/ufs, u-boot/uefi, and mbr/gpt (plus a number of different
> > u-boot boards). For mips we have a similar mix. Powerpc we support 4 or
> > 6 ways. It's just too much to hope to test and ensure works. Each new
> > thing has an non-trivial cost, and I see zero benefit from this one more
> > thing, especially since it gets in the way of UEFI boot manager support.
>
> Whatever happens, this needs to not break existing installs.


I don' tthink it will.

We can
> remove probing floppy drives, fine (does anyone even HAVE those
> anymore?).


The kernel is likely too big. But my point was more that if I boot
loader.efi off a hard drive, the floppy isn't the place to find a kernel by
default in the absence of very explicit instructions to do so.


> CD-ROM drives, will break auto-detection when booting from a
> liveDVD, but that can be mitigated by specifying loader args (I suppose
> we'll need to have loader get args from the boot.config files
> eventually).


CD/DVD booing won't break. We'll still load a kernel from them. No
boot.config needed for this case (though it might be for others).


> But for now, loader.efi has got to work whether installed
> in a boot1/loader (legacy) configuration, or installed directly to the
> ESP.  Otherwise, there's going to be a lot of unhappy people out there.
>

Correct. My proposed behavior will do just that, and if we get it wrong by
default (a) you can be explicit with boot variables or (b) you can type
something into the OK prompt, which you didn't have before.


> As for the fallback search, it's just that: a fallback mechanism.  Its
> job is to make a sane guess as to where to find the system, but
> ultimately it's not doing anything the user can't do themselves.  And it
> will only run if the EFI vars aren't set anyway, so it can't possibly
> interfere with any of that.
>

And the fallback mechanism of typing what you want is wrong because? But
it's job isn't to guess. If we don't know for sure what to boot, it's our
job to fail so the next OS in the list gets a shot at booting.

So, if we look at the sequence coming up, I'd like to propose the following:

We look at BootCurrent. If this exists, we look at BootXXXX to see the
current boot vars. This bootvar will have two things in it. It will have a
path to what was boot (possibly with a path of what to boot next) and a
command line. This command line is also passed to us by the BIOS. If the
command line has a root filesystem specifier, use it for currdir. If there
was a next thing to load (eg HD(<mumble>)/boot/kernel/kernel), then use
HD(<mumble>) as currdir. Otherwise, if can find a ZFS pool (or there's more
than one and one is specified as bootenv), use it as currdir. Otherwise, if
we can find a UFS partition on the same drive as loader.efi came from that
has /boot/loader.rc (or whatever the file is in lua loader), use that for
currdir. If we still can't find currdir at this point, prompt for a currdir
(timeout after 10s) -- we have no scipt loaded at this point to do
prompting...

We could add loading boot.config from the same ESP \efi\freebsd\boot.config
at the beginning....

This is going to be tricky to code up as it is... This is basically what
I'd written up things in two docs:

https://docs.google.com/document/d/1aK9IqF-60JPEbUeSAUAkYjF2W_8EnmczFs6RqCT90Jg/edit#heading=h.jdwnfj2sxlfb
(UEFI boot protocol, lightly edited to include the above summary)
https://docs.google.com/document/d/1l9tognVBx_QmWx6ZvilgEj2ndoIaMJhPPNllZZyHJj0/edit#heading=h.9ps7k4bunurf
(ZFS UEFI media type to be able to specify things exactly if one wants)

to try to get this all sorted out...

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfo3Q_z1%2BO=VqUNcvPnyL%2BMFDnzzv5GWKbPdu8O-ZsQPyQ>