Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Apr 2016 09:03:04 -0500
From:      Andrew Berg <aberg010@my.hennepintech.edu>
To:        <freebsd-fs@freebsd.org>
Subject:   Re: How to speed up slow zpool scrub?
Message-ID:  <57236998.5090908@my.hennepintech.edu>
In-Reply-To: <08d59afe-c835-fa8d-0e52-78afcb1cc030@denninger.net>
References:  <381846248.2672053.1461695277122.JavaMail.yahoo.ref@mail.yahoo.com> <381846248.2672053.1461695277122.JavaMail.yahoo@mail.yahoo.com> <1461736217.1121.17.camel@michaeleichorn.com> <alpine.GSO.2.20.1604290821210.23612@freddy.simplesystems.org> <08d59afe-c835-fa8d-0e52-78afcb1cc030@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2016.04.29 08:47, Karl Denninger wrote:
> On 4/29/2016 08:31, Bob Friesenhahn wrote:
>> On Wed, 27 Apr 2016, Michael B. Eichorn wrote:
>>>
>>> It does not *need* to be ECC ram, ECC is just *highly recommended*. As
>>> one of the key features of zfs is bitrot prevention, it makes sense to
>>> protect against bitrot everywhere. Zfs (and thus freenas) is just fine
>>> with non-ecc ram. Just, like for any filesystem if the bit is flipped
>>> in ram it will be recorded as such on disk.
>>
>> This is not necessarily the case.  Zfs does not offer additional
>> protections for data in RAM.  It assumes that data in RAM is protected
>> in other ways.  The on-disk checksum only verifies that the data was
>> not modified since it was checksummed, but it may already be corrupt.
>> The risk factor is pretty high if RAM becomes corrupted since zfs uses
>> so much RAM.
>>
>> It is possible to lose data and even the whole pool due to memory
>> corruption.
>>
>> There are well known cases where users encountered continual/periodic
>> pool corruptions due to flaky RAM.
>>
>> Bob
>
> To amplify what Bob said using ZFS on a system without ECC RAM is just
> begging to lose the entire pool at some point due to a random bit-error
> in system memory and the fact that it happened may be completely
> concealed from you for quite a while until at a random later point in
> time you discover the pool is hopelessly corrupt.
>
> ZFS makes the *assumption*, fair or not, that everything in its
> RAM-based caches is correct.  If that assumption is violated you will
> eventually be a very sad Panda.  Use ECC memory or don't use ZFS.
>
ZFS assumes a lot less than other filesystems. ZFS will complain if things 
aren't right. It is *less* likely to fall apart than another filesystem under 
this condition since it keeps redundant checksummed copies of metadata. There 
is zero reason to think than any other filesystem will protect you from bad 
RAM. If a bit flip messes up the metadata, you are going to have a bad day no 
matter what. At least with ZFS, you will know about it right away. Your 
argument really boils down to "Use ECC memory or don't use computers".

To put it another way: if you are experiencing corruption from bad RAM, what 
filesystem would you go with to protect your data?

On a side note, ZFS uses lots of RAM for *caching*. Any corruption there might 
affect your applications and/or system, but not your on-disk data. It's also 
highly unlikely that you will have bad data written, but then never have any 
bit flips on reads later. You *will* know something is wrong.


I have personally had bad RAM, and ZFS was the only reason I was able to figure 
out why I had issues. If your bit flips aren't in kernel memory, then you won't 
have crashes and other obvious problems, but you will have weird data in your 
applications, and then you'll see some checksum errors in a 'zpool status'.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57236998.5090908>