Date: Sun, 19 Dec 2021 11:40:43 -0800 From: Lee Brown <leeb@ratnaling.org> To: Fabian Keil <freebsd-listen@fabiankeil.de> Cc: FreeBSD hackers <freebsd-hackers@freebsd.org> Subject: Re: Patches for GPT and geli recovery Message-ID: <CAFPNf59bXZTEdYzSmM7qH5mwYSykRdXrpHUOqn-qiE9ND2d=xQ@mail.gmail.com> In-Reply-To: <20211219175011.3023a232@fabiankeil.de> References: <20211219175011.3023a232@fabiankeil.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000054c47505d384f311 Content-Type: text/plain; charset="UTF-8" On Sun, Dec 19, 2021 at 8:52 AM Fabian Keil <freebsd-listen@fabiankeil.de> wrote: > [cut] > BTW, I would also be interested to know if others have > experienced similar data corruption and could figure > out how it happened. > Sounds like bitrot. Bits flip on disks all the time, it doesn't matter if they are spinning rust or SSD, it happens. Sometimes they are detected and corrected, in which case you won't know. Sometimes they are detected and uncorrectable, you'll see that error propagated into the driver. And sometimes they are not detected at all and cause no errors that the OS can surmise. The higher the density of bits, the higher the probability of corruption. SMART is not reliably predictive. How does it happen? Cosmic rays and entropy. I've had lighty written SSD's fail after a few months. I don't use ZFS, but have GELI-Authentication under a GMIRROR, so whenever a bad checksum is read, it breaks the mirror, which gets attention (Iast I looked, there wasn't a simple userland hook for bad GELI reads, but there was for GMIRROR add/remove events). HTH - lee --00000000000054c47505d384f311 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">= <div dir=3D"ltr" class=3D"gmail_attr">On Sun, Dec 19, 2021 at 8:52 AM Fabia= n Keil <<a href=3D"mailto:freebsd-listen@fabiankeil.de">freebsd-listen@f= abiankeil.de</a>> wrote:<br></div><blockquote class=3D"gmail_quote" styl= e=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddin= g-left:1ex">[cut]<br><div> BTW, I would also be interested to know if others have<br> experienced similar data corruption and could figure<br> out how it happened.<br></div></blockquote>Sounds like bitrot.=C2=A0 Bits f= lip on disks all the time, it doesn't matter if they are spinning rust = or SSD, it happens.=C2=A0 Sometimes they are detected and corrected, in whi= ch case you won't know.=C2=A0 Sometimes they are detected and uncorrect= able, you'll see that error propagated into the driver.=C2=A0 And somet= imes they are not detected at all and cause no errors that the OS can surmi= se.=C2=A0 The higher the density of bits, the higher the probability of cor= ruption.=C2=A0 SMART is not reliably predictive.=C2=A0 How does it happen?= =C2=A0 Cosmic rays and entropy.=C2=A0 I've had lighty written SSD's= fail after a few months.<br></div><div class=3D"gmail_quote"><br></div><di= v class=3D"gmail_quote">I don't use ZFS, but have GELI-Authentication u= nder a GMIRROR, so whenever a bad checksum is read, it breaks the mirror, w= hich gets attention (Iast I looked, there wasn't a simple userland hook= for bad GELI reads, but there was for GMIRROR add/remove events).<br></div= ><div class=3D"gmail_quote"><br></div><div class=3D"gmail_quote">HTH - lee<= br></div></div> --00000000000054c47505d384f311--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFPNf59bXZTEdYzSmM7qH5mwYSykRdXrpHUOqn-qiE9ND2d=xQ>