Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 May 2013 13:29:58 +0000
From:      Ivailo Tanusheff <Ivailo.Tanusheff@skrill.com>
To:        Paul Kraus <paul@kraus-haus.org>
Cc:        Liste FreeBSD <freebsd-questions@freebsd.org>
Subject:   RE: ZFS install on a partition
Message-ID:  <b179c20ebde742358e2cc52a1f04133e@DB3PR07MB059.eurprd07.prod.outlook.com>
In-Reply-To: <A9599DD7-1A32-4607-BC83-2E6E4D03C560@kraus-haus.org>
References:  <F744BBF1-D98C-47BF-9546-14D1A9CB3733@todoo.biz> <372082cab2064846809615a8073e022c@DB3PR07MB059.eurprd07.prod.outlook.com> <A9599DD7-1A32-4607-BC83-2E6E4D03C560@kraus-haus.org>

next in thread | previous in thread | raw e-mail | index | archive | help
The software RAID depends not only from the disks, but also from the change=
s on the OS, which will occur more frequently than an update of the firmwar=
e of the raid controller. So that makes the hardware raid more stable and r=
eliable.
Also the resources of the hardware raid are exclusively used by the raid co=
ntroller, which is not true for a software raid.
So I do not get your point of appointing that a software raid is same/bette=
r than the hardware one.

About the second part - I point over both stability and reliability. Having=
 a spare disk reduces the risk as the recovery operation will start as soon=
 as a disk fails. It may sound paranoid, but still the possibility of a fai=
ling disk which is detected after 8, 12 or even 24 hours is pretty big.
Not sure about your calculations, hope you trust them, but in my previous c=
ompany we have a 3-4 months period when a disk fails almost every day on 2 =
year old servers, so trust me - I do NOT trust those calculations, as I've =
seen the opposite. Maybe it was a failed batch of disk, shipped in the coun=
try, but no one is insured against this. Yes, you can use several hot spare=
s on the software raid, but:
1. You still depend on the problems, related to the OS.
2. If you read what the mate asking has written - you will see that is not =
possible for him.

I agree on the mentioned about recovering bid chunks of data, that's why I =
suggested that he uses several smaller LUNs for the zpool.

Best regards,
Ivailo Tanusheff

-----Original Message-----
From: owner-freebsd-questions@freebsd.org [mailto:owner-freebsd-questions@f=
reebsd.org] On Behalf Of Paul Kraus
Sent: Saturday, May 18, 2013 4:02 PM
To: Ivailo Tanusheff
Cc: Liste FreeBSD
Subject: Re: ZFS install on a partition

On May 18, 2013, at 3:21 AM, Ivailo Tanusheff <Ivailo.Tanusheff@skrill.com>=
 wrote:

> If you use HBA/JBOD then you will rely on the software RAID of the ZFS sy=
stem. Yes, this RAID is good, but unless you use SSD disks to boost perform=
ance and a lot of RAM the hardware raid should be more reliable and mush fa=
ster.

	Why will the hardware raid be more reliable ? While hardware raid is susce=
ptible to uncorrectable errors from the physical drives (hardware raid cont=
rollers rely on the drives to report bad reads and writes), and the uncorre=
ctable error rate for modern drives is such that with high capacity drives =
(1TB and over) you are almost certain to run into a couple over the operati=
onal life of the drive. 10^-14 for cheap drives and 10^-15 for better drive=
s, very occasionally I see a drive rated for 10^-16. Run the math and see h=
ow many TB worth of data you have to write and read (remember these failure=
s are generally read failures with NO indication that a failure occurred, b=
ad data is just returned to the system).

	In terms of performance HW raid is faster, generally due to the cache RAM =
built into the HW raid controller. ZFS makes good use of system, RAM for th=
e same function. An SSD can help with performance if the majority of writes=
 are sync (NFS is a good example of this) or if you can benefit from a much=
 larger read cache. SSDs are deployed with ZFS as either write LOG devices =
(in which case they should be mirrored), but they only come into play for S=
YNC writes; and as an extension of the ARC, the L2ARC, which does not have =
to be mirrored as it is only a cache of existing data for spying up reads.

> I didn't get if you want to use the system to dual boot Linux/FreeBSD or =
just to share FreeBSD space with linux.
> But I would advise you to go with option 1 - you will get most of the sys=
tem and obviously you don't need zpool with raid, as your LSI controller wi=
ll do all the redundancy for you. Making software RAID over the hardware on=
e will only decrease performance and will NOT increase the reliability, as =
you will not be sure which information is stored on which physical disk.
>=20
> If stability is a MUST, then I will also advise you to go with bunch of p=
ools and a disk designated as hot spare - in case some disk dies you will r=
ely on the automation recovery. Also you should run monitoring tool on your=
 raid controller.

	I think you misunderstand the difference between stability and reliability=
. Any ZFS configuration I have tried on FreeBSD is STABLE, having redundant=
 vdevs (mirrors or RAIDz<n>) along with hot spares can increase RELIABILITY=
. The only advantage to having a hot spare is that when a drive fails (and =
they all fail eventually), the REPLACE operation can start immediately with=
out you noticing and manually replacing the failed drive.

	Reliability is a combination of reduction in MTBF (mean time between failu=
re) and MTTR (mean time to repair). Having a hot spare reduces the MTTR. Th=
e other way to improve MTTR is to go with smaller drives to recede the time=
 it takes the system to resilver a failed drive. This is NOT applicable in =
the OP's situation. I try very hard not so use drives larger than 1TB becau=
se resilver times can be days. Resilver time also depends on the total size=
 of the the data in a zpool, as a resolver operation walks the FS in time, =
replaying all the writes and confirming that all the data on disk is good (=
it does not actually rewrite the data unless it finds bad data). This means=
 a couple things, the first of which is that the resilver time will be depe=
ndent on the amount of data you have written, not the capacity. A zppol wit=
h a capacity of multiple TB will resilver in seconds if there is only a few=
 hundred MB written to it. Since the resilver operation is not just a block=
 by block copy,
  but a replay, it is I/Ops limited not bandwidth limited. You might be abl=
e to stream sequential data from a drive at hundreds of MB/sec., but most S=
ATA drives will not sustain more than one to two hundred RANDOM I/Ops (sequ=
entially they can do much more).

> You can also set copies=3D2/3 just in case some errors occur, so ZFS can =
auto0repair the data. if you run ZFS over several LUNs this will make even =
more sense.=20

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3 Sound Coordinator, Schenectady Lig=
ht Opera Company

_______________________________________________
freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman=
/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org=
"





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b179c20ebde742358e2cc52a1f04133e>