Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Dec 2023 22:57:12 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Zaphod Beeblebrox <zbeeble@gmail.com>
Cc:        Pete French <pete@twisted.org.uk>, FreeBSD Stable ML <stable@freebsd.org>
Subject:   Re: EFI and zfs raid mirror partial fail (14.0 and RELENG_13)
Message-ID:  <CANCZdfqRs8K5dT3C-sAKn9CAzbTVx5Jtm6safgEhCVt4BRGvmw@mail.gmail.com>
In-Reply-To: <CACpH0MfOUcdCMSf3XBdvmXAAte-pw9nNo4TNdeMjq9f%2BH_V9yQ@mail.gmail.com>
References:  <c9969fde-3653-43ce-a1f0-322e2dc4a77b@sentex.net> <e9f9acd5-6490-4b6d-8cce-a8d7826fe86c@sentex.net> <86d04457-5018-45f9-849f-eb20ed5cf380@twisted.org.uk> <CANCZdfqRsOHmyPRtH3fsMG=86RD=4Ci=hpU9VHFf20nc=0Js=Q@mail.gmail.com> <CACpH0MfOUcdCMSf3XBdvmXAAte-pw9nNo4TNdeMjq9f%2BH_V9yQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000972971060b808e3d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 1, 2023 at 10:34=E2=80=AFPM Zaphod Beeblebrox <zbeeble@gmail.co=
m> wrote:

> It can be more straightforward to update the gmirror, however.  I've done
> this with UFS --- old boot, pair of UFS/GMIRROR usb sticks that then boot
> to a ZFS array that the BIOS couldn't see (so UFS only contained /boot an=
d
> /rescue).  It's easier to know that the boot is updated identically if
> gmirrored.  Gmirror also has tools to verify, etc.
>

Yes. More straight forward, not as safe. BIOS runs before FreeBSD, and
doesn't use gmirror at all, so it can't know if one copy is good or not. IT
has to assume that the copies are always good. If you are a single user,
then the convenience is likely worth it. It's going ot be fine and if you
have a power failure while updating, then you are going to be right there
to cope with whatever fallout by choosing the right device to boot from if
the primary is corrupted. Once you reboot FreeBSD, the gmirror will
resilver (usually) and life will be good. But you have to make absolutely
sure that the gmirror never degrades (which happens sometimes on crashes)
so that it always will update when you write a new loader. If the mirror is
degraded, it will boot the old loader if the degraded side is the primary
boot device for the BIOS.

If you are deploying a redundant EFI booting system for lots of machines,
some of which are in the middle of nowhere without remote hands available,
then you can't rely on gmirror to always be right (because it can create
corrupted partitions while updating each copy that can pose problems when
you lose power. And there's the broken mirror problem that has to be
constantly monitored. At work, we cope with this by having lots of monitor
scripts for gmirror-based system and then take corrective actions when bad
things happen to a gmirror element.

But for our multiple, redundant ESPes, we manually update them one at a
time because we can't take a chance on the gmirror being broken. If we have
a drive that's the primary boot fail read-only and we can't change the BIOS
boot order, then we RMA the box (though that's rare: we can usually move
the primary and arrange a different drive to be the backup booting device).
When you have tens of thousands of machines, even low failure rates can
cause big expenses... Though the broken mirror and the BIOS boots the wrong
disk that can't be fixed problem is way more common than having gmirror
break due to a crash during an upgrade (but the latter does happen).

So yea, gmirror is convenient. But you have to watch it like a hawk to make
sure the mirror isn't broken before you do the update. And to make sure
that you can get hands on the system if an update breaks badly due to a
ill-timed power failure or system panic.

Warner


> On Fri, Dec 1, 2023 at 7:46=E2=80=AFPM Warner Losh <imp@bsdimp.com> wrote=
:
>
>>
>>
>> On Fri, Dec 1, 2023, 4:57 PM Pete French <pete@twisted.org.uk> wrote:
>>
>>>
>>> On 01/12/2023 21:53, mike tancsa wrote:
>>> > Should have looked at open PRs. There is one from a while ago
>>> >
>>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D258987
>>> >
>>> >
>>>
>>> Was thinking about this, and I was wondering if it would be possible to
>>> make the EFI partition a gmirror. So its across all discs, mounted only
>>> once, but would still boot from any of them. My understanding is geom
>>> has the label at the end, yes ? So the firmware would see the filesyste=
m
>>> on a single partition quite happily ?
>>>
>>
>> I've done this. It works ok. But I don't run like this in production. If
>> I write a new file, that has so many writes to the different disks. If t=
hey
>> all go through then life is good (this is what gets us to OK).
>>
>> BUT, if there is a power failure or crash and only some of them make it
>> to disk, then you have a corrupt ESP and the BIOS may pick that ESP to b=
oot
>> off of, booting corrupt data.
>>
>> Since this is infrequently updated, you can use a safe sequence to updat=
e
>> things one partition a time, then you might lose the file entirely, but =
it
>> will either be there and good. Or it will be gone. You can't get into a =
bad
>> situation. Either you boot old or new loader and can just quit from the
>> boot loader if it's the old one and it can't boot. Efi will try the next
>> one on the list.
>>
>> Here manual mirroring, if scripted, can be more reliable than gmirror.
>>
>> Warner
>>
>> -pete.
>>>
>>>
>>>

--000000000000972971060b808e3d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Fri, Dec 1, 2023 at 10:34=E2=80=AF=
PM Zaphod Beeblebrox &lt;<a href=3D"mailto:zbeeble@gmail.com">zbeeble@gmail=
.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1=
ex"><div dir=3D"ltr">It can be more straightforward to update the gmirror, =
however.=C2=A0 I&#39;ve done this with UFS --- old boot, pair of UFS/GMIRRO=
R usb sticks that then boot to a ZFS array that the BIOS couldn&#39;t see (=
so UFS only contained /boot and /rescue).=C2=A0 It&#39;s easier to know tha=
t the boot is updated identically if gmirrored.=C2=A0 Gmirror also has tool=
s to verify, etc.<br></div></blockquote><div><br></div><div>Yes. More strai=
ght forward, not as safe. BIOS runs before FreeBSD, and doesn&#39;t use gmi=
rror at all, so it can&#39;t know if one copy is good or not. IT has to ass=
ume that the copies are always good. If you are a single user, then the con=
venience is likely worth it. It&#39;s going ot be fine and if you have a po=
wer failure while updating, then you are going to be right there to cope wi=
th whatever fallout by choosing the right device to boot from if the primar=
y is corrupted. Once you reboot FreeBSD, the gmirror will resilver (usually=
) and life will be good. But you have to make absolutely sure that the gmir=
ror never degrades (which happens sometimes on crashes) so that it always w=
ill update when you write a new loader. If the mirror is degraded, it will =
boot the old loader if the degraded side is the primary boot device for the=
 BIOS.<br></div><div><br></div><div>If you are deploying a redundant EFI bo=
oting system for lots of machines, some of which are in the middle of nowhe=
re without remote hands available, then you can&#39;t rely on gmirror to al=
ways be right (because it can create corrupted partitions while updating ea=
ch copy that can pose problems when you lose power. And there&#39;s the bro=
ken mirror problem that has to be constantly monitored. At work, we cope wi=
th this by having lots of monitor scripts for gmirror-based system and then=
 take corrective actions when bad things happen to a gmirror element.</div>=
<div><br></div><div>But for our multiple, redundant ESPes, we manually upda=
te them one at a time because we can&#39;t take a chance on the gmirror bei=
ng broken. If we have a drive that&#39;s the primary boot fail read-only an=
d we can&#39;t change the BIOS boot order, then we RMA the box (though that=
&#39;s rare: we can usually move the primary and arrange a different drive =
to be the backup booting device). When you have tens of thousands of machin=
es, even low failure rates can cause big expenses... Though the broken mirr=
or and the BIOS boots the wrong disk that can&#39;t be fixed problem is way=
 more common than having gmirror break due to a crash during an upgrade (bu=
t the latter does happen).<br></div><div><br></div><div>So yea, gmirror is =
convenient. But you have to watch it like a hawk to make sure the mirror is=
n&#39;t broken before you do the update. And to make sure that you can get =
hands on the system if an update breaks badly due to a ill-timed power fail=
ure or system panic.</div><div><br></div><div>Warner<br></div><div>=C2=A0</=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"></di=
v><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, =
Dec 1, 2023 at 7:46=E2=80=AFPM Warner Losh &lt;<a href=3D"mailto:imp@bsdimp=
.com" target=3D"_blank">imp@bsdimp.com</a>&gt; wrote:<br></div><blockquote =
class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px sol=
id rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><div><br><br><div c=
lass=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Dec 1, 2=
023, 4:57 PM Pete French &lt;<a href=3D"mailto:pete@twisted.org.uk" target=
=3D"_blank">pete@twisted.org.uk</a>&gt; wrote:<br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex"><br>
On 01/12/2023 21:53, mike tancsa wrote:<br>
&gt; Should have looked at open PRs. There is one from a while ago<br>
&gt;<br>
&gt; <a href=3D"https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D258987"=
 rel=3D"noreferrer noreferrer" target=3D"_blank">https://bugs.freebsd.org/b=
ugzilla/show_bug.cgi?id=3D258987</a><br>
&gt;<br>
&gt;<br>
<br>
Was thinking about this, and I was wondering if it would be possible to <br=
>
make the EFI partition a gmirror. So its across all discs, mounted only <br=
>
once, but would still boot from any of them. My understanding is geom <br>
has the label at the end, yes ? So the firmware would see the filesystem <b=
r>
on a single partition quite happily ?<br></blockquote></div></div><div dir=
=3D"auto"><br></div><div dir=3D"auto">I&#39;ve done this. It works ok. But =
I don&#39;t run like this in production. If I write a new file, that has so=
 many writes to the different disks. If they all go through then life is go=
od (this is what gets us to OK).</div><div dir=3D"auto"><br></div><div dir=
=3D"auto">BUT, if there is a power failure or crash and only some of them m=
ake it to disk, then you have a corrupt ESP and the BIOS may pick that ESP =
to boot off of, booting corrupt data.</div><div dir=3D"auto"><br></div><div=
 dir=3D"auto">Since this is infrequently updated, you can use a safe sequen=
ce to update things one partition a time, then you might lose the file enti=
rely, but it will either be there and good. Or it will be gone. You can&#39=
;t get into a bad situation. Either you boot old or new loader and can just=
 quit from the boot loader if it&#39;s the old one and it can&#39;t boot. E=
fi will try the next one on the list.</div><div dir=3D"auto"><br></div><div=
 dir=3D"auto">Here manual mirroring, if scripted, can be more reliable than=
 gmirror.</div><div dir=3D"auto"><br></div><div dir=3D"auto">Warner</div><d=
iv dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quote"><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left=
:1px solid rgb(204,204,204);padding-left:1ex">
-pete.<br>
<br>
<br>
</blockquote></div></div></div>
</blockquote></div>
</blockquote></div></div>

--000000000000972971060b808e3d--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqRs8K5dT3C-sAKn9CAzbTVx5Jtm6safgEhCVt4BRGvmw>