Date: Sun, 19 Dec 2021 11:40:43 -0800 From: Lee Brown <leeb@ratnaling.org> To: Fabian Keil <freebsd-listen@fabiankeil.de> Cc: FreeBSD hackers <freebsd-hackers@freebsd.org> Subject: Re: Patches for GPT and geli recovery Message-ID: <CAFPNf59bXZTEdYzSmM7qH5mwYSykRdXrpHUOqn-qiE9ND2d=xQ@mail.gmail.com> In-Reply-To: <20211219175011.3023a232@fabiankeil.de> References: <20211219175011.3023a232@fabiankeil.de>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On Sun, Dec 19, 2021 at 8:52 AM Fabian Keil <freebsd-listen@fabiankeil.de> wrote: > [cut] > BTW, I would also be interested to know if others have > experienced similar data corruption and could figure > out how it happened. > Sounds like bitrot. Bits flip on disks all the time, it doesn't matter if they are spinning rust or SSD, it happens. Sometimes they are detected and corrected, in which case you won't know. Sometimes they are detected and uncorrectable, you'll see that error propagated into the driver. And sometimes they are not detected at all and cause no errors that the OS can surmise. The higher the density of bits, the higher the probability of corruption. SMART is not reliably predictive. How does it happen? Cosmic rays and entropy. I've had lighty written SSD's fail after a few months. I don't use ZFS, but have GELI-Authentication under a GMIRROR, so whenever a bad checksum is read, it breaks the mirror, which gets attention (Iast I looked, there wasn't a simple userland hook for bad GELI reads, but there was for GMIRROR add/remove events). HTH - lee [-- Attachment #2 --] <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Dec 19, 2021 at 8:52 AM Fabian Keil <<a href="mailto:freebsd-listen@fabiankeil.de">freebsd-listen@fabiankeil.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">[cut]<br><div> BTW, I would also be interested to know if others have<br> experienced similar data corruption and could figure<br> out how it happened.<br></div></blockquote>Sounds like bitrot. Bits flip on disks all the time, it doesn't matter if they are spinning rust or SSD, it happens. Sometimes they are detected and corrected, in which case you won't know. Sometimes they are detected and uncorrectable, you'll see that error propagated into the driver. And sometimes they are not detected at all and cause no errors that the OS can surmise. The higher the density of bits, the higher the probability of corruption. SMART is not reliably predictive. How does it happen? Cosmic rays and entropy. I've had lighty written SSD's fail after a few months.<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">I don't use ZFS, but have GELI-Authentication under a GMIRROR, so whenever a bad checksum is read, it breaks the mirror, which gets attention (Iast I looked, there wasn't a simple userland hook for bad GELI reads, but there was for GMIRROR add/remove events).<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">HTH - lee<br></div></div>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFPNf59bXZTEdYzSmM7qH5mwYSykRdXrpHUOqn-qiE9ND2d=xQ>
