From owner-freebsd-fs@freebsd.org  Mon Jan 21 14:11:26 2019
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E50D514AABC6
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 21 Jan 2019 14:11:25 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com
 [195.16.151.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C7C548B0C3
 for <freebsd-fs@freebsd.org>; Mon, 21 Jan 2019 14:11:24 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.5] (unknown [192.148.167.11])
 by proxypop01.sare.net (Postfix) with ESMTPA id B7E499DDD00;
 Mon, 21 Jan 2019 15:11:12 +0100 (CET)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: Re: ZFS on Hardware RAID
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <CAMdBLfQc6pWtYv2JTZsBT9HuQ1xbfvAjO0PXQLUzObUF367A4g@mail.gmail.com>
Date: Mon, 21 Jan 2019 15:11:09 +0100
Cc: Maciej Jan Broniarz <gausus@gausus.net>,
 freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <8761FDAC-3D47-4827-A0E5-A4F34C4C3BAE@sarenet.es>
References: <1180280695.63420.1547910313494.JavaMail.zimbra@gausus.net>
 <92646202.63422.1547910433715.JavaMail.zimbra@gausus.net>
 <CAMdBLfQc6pWtYv2JTZsBT9HuQ1xbfvAjO0PXQLUzObUF367A4g@mail.gmail.com>
To: jdelisle <jdelisle@gmail.com>
X-Mailer: Apple Mail (2.3445.102.3)
X-Rspamd-Queue-Id: C7C548B0C3
X-Spamd-Bar: /
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of borjam@sarenet.es designates
 195.16.151.151 as permitted sender) smtp.mailfrom=borjam@sarenet.es
X-Spamd-Result: default: False [-0.63 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-0.36)[-0.358,0];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+ip4:195.16.150.0/23]; MV_CASE(0.50)[];
 MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[];
 DMARC_NA(0.00)[sarenet.es]; TO_DN_SOME(0.00)[];
 NEURAL_SPAM_SHORT(0.52)[0.516,0];
 NEURAL_HAM_LONG(-0.98)[-0.983,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[smtp.sarenet.es,smtp.sarenet.es,smtp.sarenet.es];
 RCVD_IN_DNSWL_NONE(0.00)[151.151.16.195.list.dnswl.org : 127.0.10.0];
 IP_SCORE(0.01)[country: ES(0.05)];
 FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:3262, ipnet:195.16.128.0/19, country:ES];
 MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Jan 2019 14:11:26 -0000


> On 19 Jan 2019, at 20:29, jdelisle <jdelisle@gmail.com> wrote:
>=20
> You'll find a lot of strong opinions on this topic over at the FreeNAS
> forums.   I too am wish an authoritative, knowledgeable SME would =
answer
> and thoroughly explain the inner workings and the risks involved.  =
Most of
> the FreeNAS forum posts on this topic devolve into hand-waving and =
blurry
> incomplete explanations that end in statements like "trust me, don't =
do
> it".  I'd love to understand why not.  I'm curious and eager to learn =
more.

Alright, let me try with one of the many reasons to avoid =E2=80=9Chardwar=
e RAIDs=E2=80=9D.

Disk redundancy in a RAID system offers several benefits. It can not =
only detect
data corruption (and not all systems are equally good at that) but it =
can also help
repair it. Of course with adequate redundancy you can survive the loss =
of one
or several drives.

But as I said not all corruption detection schemes are born equal. ZFS =
has a very=20
sophisticated and effective one, developed as an answer to the =
increasing size
of the storage systems and the files. Everything has become so large, =
the probability
of an unnoticed corrupted block is quite high. Some storage systems have =
suffered
from silent data corruption for instance.

Common =E2=80=9Chardware based RAID systems=E2=80=9D, which means =
=E2=80=9Csoftware running on
a small, embedded processor=E2=80=9D usually have quite limited checksum =
systems. ZFS has
a much more robust one.

So, now let=E2=80=99s assume we are setting up a server and we have two =
choices: use the HBA
in =E2=80=9Chardware RAID=E2=80=9D mode, or use it just as a common HBA =
relying on ZFS for redundancy.

Option 1: Hardware RAID. Which is the preferred option by many people =
because, well,=20
=E2=80=9Chardware=E2=80=9D sounds more reliable.

Option 2: ZFS using disks, period. I refuse to use the JBOD term because =
it=E2=80=99s an added
layer of confusion over what should be a simple subject.=20

Now let=E2=80=99s imagine that there=E2=80=99s some data corruption on =
one of the disks. The corruption is
not detected by the hardware RAID but it=E2=80=99s promptly detected by =
the more elaborate
ZFS checksum scheme.=20

If we chose option 1, ZFS will let us know that there is a corrupted =
file. But because
redundancy was provided only by the underlying =E2=80=9Chardware RAID=E2=80=
=9D ZFS won=E2=80=99t have the=20
ability to heal anything.=20

Had we chosen option 2, however, and assuming that there was some =
redundancy, ZFS
would not only report a data corruption incident, but it would return =
correct data unless=20
the blocks were corrupted in several of the disks.=20

Assuming that ZFS has a much better error correction/detection than most =
=E2=80=9Chardware
RAID=E2=80=9D options (except the high end storage subsystems), running =
ZFS on a logical volume
built on a =E2=80=9Chardware RAID=E2=80=9D is roughly equivalent to =
running it on a single disk with no
redundancy. At least you won=E2=80=99t get a real benefit of the better =
recovery mechanisms
offered by ZFS.

Do you want another reason? If you use a "hardware RAID=E2=80=9D =
solution you are stuck with it.
In case you suffer a controller failure you will need the same hardware =
to recover. With ZFS you can
move the disks to a different system with a different HBA. As long as =
ZFS can access the disks
without any odd stuff it will work regardless of the hardware =
manufacturer.

There are other important performance reasons related to the handling of =
data and metadata
but I think that the first reason I mentioned (ZFS error recovery and =
healing capabilities) is
a strong enough motivation to avoid =E2=80=9Chardware RAIDs=E2=80=9D


Borja.