Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 May 2013 09:02:15 -0400
From:      Paul Kraus <paul@kraus-haus.org>
To:        Ivailo Tanusheff <Ivailo.Tanusheff@skrill.com>
Cc:        Liste FreeBSD <freebsd-questions@freebsd.org>
Subject:   Re: ZFS install on a partition
Message-ID:  <A9599DD7-1A32-4607-BC83-2E6E4D03C560@kraus-haus.org>
In-Reply-To: <372082cab2064846809615a8073e022c@DB3PR07MB059.eurprd07.prod.outlook.com>
References:  <F744BBF1-D98C-47BF-9546-14D1A9CB3733@todoo.biz> <372082cab2064846809615a8073e022c@DB3PR07MB059.eurprd07.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On May 18, 2013, at 3:21 AM, Ivailo Tanusheff =
<Ivailo.Tanusheff@skrill.com> wrote:

> If you use HBA/JBOD then you will rely on the software RAID of the ZFS =
system. Yes, this RAID is good, but unless you use SSD disks to boost =
performance and a lot of RAM the hardware raid should be more reliable =
and mush faster.

	Why will the hardware raid be more reliable ? While hardware =
raid is susceptible to uncorrectable errors from the physical drives =
(hardware raid controllers rely on the drives to report bad reads and =
writes), and the uncorrectable error rate for modern drives is such that =
with high capacity drives (1TB and over) you are almost certain to run =
into a couple over the operational life of the drive. 10^-14 for cheap =
drives and 10^-15 for better drives, very occasionally I see a drive =
rated for 10^-16. Run the math and see how many TB worth of data you =
have to write and read (remember these failures are generally read =
failures with NO indication that a failure occurred, bad data is just =
returned to the system).

	In terms of performance HW raid is faster, generally due to the =
cache RAM built into the HW raid controller. ZFS makes good use of =
system, RAM for the same function. An SSD can help with performance if =
the majority of writes are sync (NFS is a good example of this) or if =
you can benefit from a much larger read cache. SSDs are deployed with =
ZFS as either write LOG devices (in which case they should be mirrored), =
but they only come into play for SYNC writes; and as an extension of the =
ARC, the L2ARC, which does not have to be mirrored as it is only a cache =
of existing data for spying up reads.

> I didn't get if you want to use the system to dual boot Linux/FreeBSD =
or just to share FreeBSD space with linux.
> But I would advise you to go with option 1 - you will get most of the =
system and obviously you don't need zpool with raid, as your LSI =
controller will do all the redundancy for you. Making software RAID over =
the hardware one will only decrease performance and will NOT increase =
the reliability, as you will not be sure which information is stored on =
which physical disk.
>=20
> If stability is a MUST, then I will also advise you to go with bunch =
of pools and a disk designated as hot spare - in case some disk dies you =
will rely on the automation recovery. Also you should run monitoring =
tool on your raid controller.

	I think you misunderstand the difference between stability and =
reliability. Any ZFS configuration I have tried on FreeBSD is STABLE, =
having redundant vdevs (mirrors or RAIDz<n>) along with hot spares can =
increase RELIABILITY. The only advantage to having a hot spare is that =
when a drive fails (and they all fail eventually), the REPLACE operation =
can start immediately without you noticing and manually replacing the =
failed drive.

	Reliability is a combination of reduction in MTBF (mean time =
between failure) and MTTR (mean time to repair). Having a hot spare =
reduces the MTTR. The other way to improve MTTR is to go with smaller =
drives to recede the time it takes the system to resilver a failed =
drive. This is NOT applicable in the OP's situation. I try very hard not =
so use drives larger than 1TB because resilver times can be days. =
Resilver time also depends on the total size of the the data in a zpool, =
as a resolver operation walks the FS in time, replaying all the writes =
and confirming that all the data on disk is good (it does not actually =
rewrite the data unless it finds bad data). This means a couple things, =
the first of which is that the resilver time will be dependent on the =
amount of data you have written, not the capacity. A zppol with a =
capacity of multiple TB will resilver in seconds if there is only a few =
hundred MB written to it. Since the resilver operation is not just a =
block by block copy, but a replay, it is I/Ops limited not bandwidth =
limited. You might be able to stream sequential data from a drive at =
hundreds of MB/sec., but most SATA drives will not sustain more than one =
to two hundred RANDOM I/Ops (sequentially they can do much more).

> You can also set copies=3D2/3 just in case some errors occur, so ZFS =
can auto0repair the data. if you run ZFS over several LUNs this will =
make even more sense.=20

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A9599DD7-1A32-4607-BC83-2E6E4D03C560>