Date: Tue, 2 Sep 2025 13:25:59 -0600 From: Warner Losh <imp@bsdimp.com> To: Tomoaki AOKI <junchoon@dec.sakura.ne.jp> Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Graham Perrin <grahamperrin@gmail.com>, FreeBSD-CURRENT <freebsd-current@freebsd.org> Subject: Re: Using a recovery partition to repair a broken installation of FreeBSD Message-ID: <CANCZdfpXN_W4Q6ucchM_UpvaD5QNZOX0Oq_XahcYtePxDxTHgQ@mail.gmail.com> In-Reply-To: <20250902225500.70577e08c0584754e743bac9@dec.sakura.ne.jp> References: <7b384ac0-9b24-43a4-bf63-012d745155a7@gmail.com> <aKD970iOlzyQNi0d@amaryllis.le-fay.org> <18e1a7e9-07d8-43a2-96af-0acdab6c2920@gmail.com> <babf662e-cded-4a2c-b5e8-c5a7175739f2@gmail.com> <20250901175827.73ba0ea24812cebe2263811f@dec.sakura.ne.jp> <202509010904.58194iP2007318@critter.freebsd.dk> <CANCZdfrrybisM6gSvsqKHfT2yk6ACXH=g=0oae1iVGBAdwWZQg@mail.gmail.com> <20250901204243.6548150b14d79d2eab04ad3d@dec.sakura.ne.jp> <CANCZdfq3OY6_Lgc5E_64Xb5E9LBhnVgYu5E4fBoJPvWtuDM3fQ@mail.gmail.com> <20250902225500.70577e08c0584754e743bac9@dec.sakura.ne.jp>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Tue, Sep 2, 2025 at 7:55 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp> wrote: > On Mon, 1 Sep 2025 21:02:45 -0600 > Warner Losh <imp@bsdimp.com> wrote: > > > On Mon, Sep 1, 2025 at 5:42 AM Tomoaki AOKI <junchoon@dec.sakura.ne.jp> > > wrote: > > > > > On Mon, 1 Sep 2025 03:15:50 -0600 > > > Warner Losh <imp@bsdimp.com> wrote: > > > > > > > On Mon, Sep 1, 2025, 3:05 AM Poul-Henning Kamp <phk@phk.freebsd.dk> > > > wrote: > > > > > > > > > -------- > > > > > Tomoaki AOKI writes: > > > > > > > > > > > > > > > > > > … it would be nice to have something like 'recovery > partition', > > > as > > > > > > > some OSes have. or at least some tiny fail-safe feature. having > > > remote > > > > > > > machine in some distant datacenter, booting from a flashstick > is > > > > > always > > > > > > > a problem. > > > > > > > > > > I thought that is what /rescue is for ? > > > > > > > > > > > > > That only works if your boot loader can read it... I've thought for a > > > > while now that maybe we should move that into a ram disk image that > we > > > fall > > > > back to if the boot loader can't read anything else... > > > > > > > > Warner > > > > > > Exactly. If the loader (or bootcode to kick the loader in the > > > partition/pool) can sanely read the partition/pool to boot from, > > > I think /rescue is enough and no need for rescue "partition / pool". > > > > > > But once the partition / pool to boot is broken (including lost > > > decryption key for encrypted partitions/drives from regular place), > > > something others are needed. > > > > > > And what can be chosen to boot from BIOS/UEFI firmware depends on > > > the implementation (some could restrict per-drive only, instead of > > > every entry in EFI boot manager table). > > > > > > If BIOS/firmware allow to choose "drive" to boot, rescue "drive" > > > is useful, if multiple physical drives are available. > > > > > > Yes, rescue mfsroot embedded into loader.efi would be a candidate, too, > > > if the size of ESP allows. > > > > > > Rescue is quite small. On the order of 8MB compressed. The trouble is > that > > the kernel is like 12MB compressed, plus we'd need a few more modules. > > Still, we could likely get something under 25MB that's an MD image that > we > > could boot into, but it would have to be single user. And It's been a > while > > since I did that... Typically I just run /rescue/init or /rescue/sh, > which > > isn't a full system and still uses the system's /etc. If we customized it > > per system, we could do better, since the kernel can be a bit smaller > > (compressed our kernels at work are 6MB), so under 20MB could be > possible. > > We'd not need /boot/loader.efi in there. > > Oh, much smaller than I've expected! > > Actually, using boot1.efi (either stock or patched), users of Root on > ZFS can have rescue UFS partition on the same drive. > This is because it looks for /boot/loader.efi to kick from ZFS pool > first, then, UFS. This is per-drive priority and if both are NOT found, > boot1.efi looks for another drive with the order that UEFI firmware > recognized. (The first to try is the drive boot1.efi itself was kicked.) > > This is how smh@ implemented when I requested to fix boot issue > on UEFI boot (at the moment, loader.efi cannot be kicked directly > by UEFI firmware and needed boot1.efi). > This isn't true, at least not generally. We load loader.efi in all new installations by default. I've fixed a number of issues around this from the past... We're not able to use it at netflix to boot off of ZFS, for example... > Maybe Warner would remember, before the fix, boot1.efi always looked for > /boot/loader.efi with the order UEFI firmware recognized drives, > thus, even if started from USB memstick for rescue, boot1.efi > "always" kicked the first "internal" drive and cannot rescue. > Yes, fresh installations was OK with it, as there's no /boot/loader.efi > in any of internal drives. > Yea, I'm not remembering it... > > If we could hook into the arch specific traps that cause segv, etc, we > > could do a setjmp early and set 'safe mode' and restart. Though that may > > be trickier than I initially am thinking... maybe the best bet is to let > > uefi catch that failure and have the next bootable BootXXXX environment > on > > the list specify a safe mode. More investigation might be needed. > > > > Warner > > Yeah, and it could be (and would actually be) implementation-specific. > Maybe chaotic in real world and lots of quirks would be required. > I don't understand that part... It would be architecture specific, but why would it be implementation specific? Warner [-- Attachment #2 --] <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, Sep 2, 2025 at 7:55 AM Tomoaki AOKI <<a href="mailto:junchoon@dec.sakura.ne.jp">junchoon@dec.sakura.ne.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 1 Sep 2025 21:02:45 -0600<br> Warner Losh <<a href="mailto:imp@bsdimp.com" target="_blank">imp@bsdimp.com</a>> wrote:<br> <br> > On Mon, Sep 1, 2025 at 5:42 AM Tomoaki AOKI <<a href="mailto:junchoon@dec.sakura.ne.jp" target="_blank">junchoon@dec.sakura.ne.jp</a>><br> > wrote:<br> > <br> > > On Mon, 1 Sep 2025 03:15:50 -0600<br> > > Warner Losh <<a href="mailto:imp@bsdimp.com" target="_blank">imp@bsdimp.com</a>> wrote:<br> > ><br> > > > On Mon, Sep 1, 2025, 3:05 AM Poul-Henning Kamp <<a href="mailto:phk@phk.freebsd.dk" target="_blank">phk@phk.freebsd.dk</a>><br> > > wrote:<br> > > ><br> > > > > --------<br> > > > > Tomoaki AOKI writes:<br> > > > ><br> > > > ><br> > > > > > > > … it would be nice to have something like 'recovery partition',<br> > > as<br> > > > > > > some OSes have. or at least some tiny fail-safe feature. having<br> > > remote<br> > > > > > > machine in some distant datacenter, booting from a flashstick is<br> > > > > always<br> > > > > > > a problem.<br> > > > ><br> > > > > I thought that is what /rescue is for ?<br> > > > ><br> > > ><br> > > > That only works if your boot loader can read it... I've thought for a<br> > > > while now that maybe we should move that into a ram disk image that we<br> > > fall<br> > > > back to if the boot loader can't read anything else...<br> > > ><br> > > > Warner<br> > ><br> > > Exactly. If the loader (or bootcode to kick the loader in the<br> > > partition/pool) can sanely read the partition/pool to boot from,<br> > > I think /rescue is enough and no need for rescue "partition / pool".<br> > ><br> > > But once the partition / pool to boot is broken (including lost<br> > > decryption key for encrypted partitions/drives from regular place),<br> > > something others are needed.<br> > ><br> > > And what can be chosen to boot from BIOS/UEFI firmware depends on<br> > > the implementation (some could restrict per-drive only, instead of<br> > > every entry in EFI boot manager table).<br> > ><br> > > If BIOS/firmware allow to choose "drive" to boot, rescue "drive"<br> > > is useful, if multiple physical drives are available.<br> > ><br> > > Yes, rescue mfsroot embedded into loader.efi would be a candidate, too,<br> > > if the size of ESP allows.<br> > <br> > <br> > Rescue is quite small. On the order of 8MB compressed. The trouble is that<br> > the kernel is like 12MB compressed, plus we'd need a few more modules.<br> > Still, we could likely get something under 25MB that's an MD image that we<br> > could boot into, but it would have to be single user. And It's been a while<br> > since I did that... Typically I just run /rescue/init or /rescue/sh, which<br> > isn't a full system and still uses the system's /etc. If we customized it<br> > per system, we could do better, since the kernel can be a bit smaller<br> > (compressed our kernels at work are 6MB), so under 20MB could be possible.<br> > We'd not need /boot/loader.efi in there.<br> <br> Oh, much smaller than I've expected!<br> <br> Actually, using boot1.efi (either stock or patched), users of Root on<br> ZFS can have rescue UFS partition on the same drive.<br> This is because it looks for /boot/loader.efi to kick from ZFS pool<br> first, then, UFS. This is per-drive priority and if both are NOT found,<br> boot1.efi looks for another drive with the order that UEFI firmware<br> recognized. (The first to try is the drive boot1.efi itself was kicked.)<br> <br> This is how smh@ implemented when I requested to fix boot issue<br> on UEFI boot (at the moment, loader.efi cannot be kicked directly<br> by UEFI firmware and needed boot1.efi).<br></blockquote><div><br></div><div>This isn't true, at least not generally. We load loader.efi in all new installations by default. I've fixed a number of issues around this from the past... We're not able to use it at netflix to boot off of ZFS, for example...</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Maybe Warner would remember, before the fix, boot1.efi always looked for<br> /boot/loader.efi with the order UEFI firmware recognized drives,<br> thus, even if started from USB memstick for rescue, boot1.efi<br> "always" kicked the first "internal" drive and cannot rescue.<br> Yes, fresh installations was OK with it, as there's no /boot/loader.efi<br> in any of internal drives.<br></blockquote><div><br></div><div>Yea, I'm not remembering it...</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> > If we could hook into the arch specific traps that cause segv, etc, we<br> > could do a setjmp early and set 'safe mode' and restart. Though that may<br> > be trickier than I initially am thinking... maybe the best bet is to let<br> > uefi catch that failure and have the next bootable BootXXXX environment on<br> > the list specify a safe mode. More investigation might be needed.<br> > <br> > Warner<br> <br> Yeah, and it could be (and would actually be) implementation-specific.<br> Maybe chaotic in real world and lots of quirks would be required.<br></blockquote><div><br></div><div>I don't understand that part... It would be architecture specific, but why would it be implementation specific?</div><div><br></div><div>Warner </div></div></div>help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpXN_W4Q6ucchM_UpvaD5QNZOX0Oq_XahcYtePxDxTHgQ>
