Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Dec 2021 11:40:43 -0800
From:      Lee Brown <leeb@ratnaling.org>
To:        Fabian Keil <freebsd-listen@fabiankeil.de>
Cc:        FreeBSD hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Patches for GPT and geli recovery
Message-ID:  <CAFPNf59bXZTEdYzSmM7qH5mwYSykRdXrpHUOqn-qiE9ND2d=xQ@mail.gmail.com>
In-Reply-To: <20211219175011.3023a232@fabiankeil.de>
References:  <20211219175011.3023a232@fabiankeil.de>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000054c47505d384f311
Content-Type: text/plain; charset="UTF-8"

On Sun, Dec 19, 2021 at 8:52 AM Fabian Keil <freebsd-listen@fabiankeil.de>
wrote:

> [cut]
> BTW, I would also be interested to know if others have
> experienced similar data corruption and could figure
> out how it happened.
>
Sounds like bitrot.  Bits flip on disks all the time, it doesn't matter if
they are spinning rust or SSD, it happens.  Sometimes they are detected and
corrected, in which case you won't know.  Sometimes they are detected and
uncorrectable, you'll see that error propagated into the driver.  And
sometimes they are not detected at all and cause no errors that the OS can
surmise.  The higher the density of bits, the higher the probability of
corruption.  SMART is not reliably predictive.  How does it happen?  Cosmic
rays and entropy.  I've had lighty written SSD's fail after a few months.

I don't use ZFS, but have GELI-Authentication under a GMIRROR, so whenever
a bad checksum is read, it breaks the mirror, which gets attention (Iast I
looked, there wasn't a simple userland hook for bad GELI reads, but there
was for GMIRROR add/remove events).

HTH - lee

--00000000000054c47505d384f311
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Sun, Dec 19, 2021 at 8:52 AM Fabia=
n Keil &lt;<a href=3D"mailto:freebsd-listen@fabiankeil.de">freebsd-listen@f=
abiankeil.de</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddin=
g-left:1ex">[cut]<br><div>
BTW, I would also be interested to know if others have<br>
experienced similar data corruption and could figure<br>
out how it happened.<br></div></blockquote>Sounds like bitrot.=C2=A0 Bits f=
lip on disks all the time, it doesn&#39;t matter if they are spinning rust =
or SSD, it happens.=C2=A0 Sometimes they are detected and corrected, in whi=
ch case you won&#39;t know.=C2=A0 Sometimes they are detected and uncorrect=
able, you&#39;ll see that error propagated into the driver.=C2=A0 And somet=
imes they are not detected at all and cause no errors that the OS can surmi=
se.=C2=A0 The higher the density of bits, the higher the probability of cor=
ruption.=C2=A0 SMART is not reliably predictive.=C2=A0 How does it happen?=
=C2=A0 Cosmic rays and entropy.=C2=A0 I&#39;ve had lighty written SSD&#39;s=
 fail after a few months.<br></div><div class=3D"gmail_quote"><br></div><di=
v class=3D"gmail_quote">I don&#39;t use ZFS, but have GELI-Authentication u=
nder a GMIRROR, so whenever a bad checksum is read, it breaks the mirror, w=
hich gets attention (Iast I looked, there wasn&#39;t a simple userland hook=
 for bad GELI reads, but there was for GMIRROR add/remove events).<br></div=
><div class=3D"gmail_quote"><br></div><div class=3D"gmail_quote">HTH - lee<=
br></div></div>

--00000000000054c47505d384f311--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFPNf59bXZTEdYzSmM7qH5mwYSykRdXrpHUOqn-qiE9ND2d=xQ>