Date: Sat, 18 May 2013 09:02:15 -0400 From: Paul Kraus <paul@kraus-haus.org> To: Ivailo Tanusheff <Ivailo.Tanusheff@skrill.com> Cc: Liste FreeBSD <freebsd-questions@freebsd.org> Subject: Re: ZFS install on a partition Message-ID: <A9599DD7-1A32-4607-BC83-2E6E4D03C560@kraus-haus.org> In-Reply-To: <372082cab2064846809615a8073e022c@DB3PR07MB059.eurprd07.prod.outlook.com> References: <F744BBF1-D98C-47BF-9546-14D1A9CB3733@todoo.biz> <372082cab2064846809615a8073e022c@DB3PR07MB059.eurprd07.prod.outlook.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On May 18, 2013, at 3:21 AM, Ivailo Tanusheff = <Ivailo.Tanusheff@skrill.com> wrote: > If you use HBA/JBOD then you will rely on the software RAID of the ZFS = system. Yes, this RAID is good, but unless you use SSD disks to boost = performance and a lot of RAM the hardware raid should be more reliable = and mush faster. Why will the hardware raid be more reliable ? While hardware = raid is susceptible to uncorrectable errors from the physical drives = (hardware raid controllers rely on the drives to report bad reads and = writes), and the uncorrectable error rate for modern drives is such that = with high capacity drives (1TB and over) you are almost certain to run = into a couple over the operational life of the drive. 10^-14 for cheap = drives and 10^-15 for better drives, very occasionally I see a drive = rated for 10^-16. Run the math and see how many TB worth of data you = have to write and read (remember these failures are generally read = failures with NO indication that a failure occurred, bad data is just = returned to the system). In terms of performance HW raid is faster, generally due to the = cache RAM built into the HW raid controller. ZFS makes good use of = system, RAM for the same function. An SSD can help with performance if = the majority of writes are sync (NFS is a good example of this) or if = you can benefit from a much larger read cache. SSDs are deployed with = ZFS as either write LOG devices (in which case they should be mirrored), = but they only come into play for SYNC writes; and as an extension of the = ARC, the L2ARC, which does not have to be mirrored as it is only a cache = of existing data for spying up reads. > I didn't get if you want to use the system to dual boot Linux/FreeBSD = or just to share FreeBSD space with linux. > But I would advise you to go with option 1 - you will get most of the = system and obviously you don't need zpool with raid, as your LSI = controller will do all the redundancy for you. Making software RAID over = the hardware one will only decrease performance and will NOT increase = the reliability, as you will not be sure which information is stored on = which physical disk. >=20 > If stability is a MUST, then I will also advise you to go with bunch = of pools and a disk designated as hot spare - in case some disk dies you = will rely on the automation recovery. Also you should run monitoring = tool on your raid controller. I think you misunderstand the difference between stability and = reliability. Any ZFS configuration I have tried on FreeBSD is STABLE, = having redundant vdevs (mirrors or RAIDz<n>) along with hot spares can = increase RELIABILITY. The only advantage to having a hot spare is that = when a drive fails (and they all fail eventually), the REPLACE operation = can start immediately without you noticing and manually replacing the = failed drive. Reliability is a combination of reduction in MTBF (mean time = between failure) and MTTR (mean time to repair). Having a hot spare = reduces the MTTR. The other way to improve MTTR is to go with smaller = drives to recede the time it takes the system to resilver a failed = drive. This is NOT applicable in the OP's situation. I try very hard not = so use drives larger than 1TB because resilver times can be days. = Resilver time also depends on the total size of the the data in a zpool, = as a resolver operation walks the FS in time, replaying all the writes = and confirming that all the data on disk is good (it does not actually = rewrite the data unless it finds bad data). This means a couple things, = the first of which is that the resilver time will be dependent on the = amount of data you have written, not the capacity. A zppol with a = capacity of multiple TB will resilver in seconds if there is only a few = hundred MB written to it. Since the resilver operation is not just a = block by block copy, but a replay, it is I/Ops limited not bandwidth = limited. You might be able to stream sequential data from a drive at = hundreds of MB/sec., but most SATA drives will not sustain more than one = to two hundred RANDOM I/Ops (sequentially they can do much more). > You can also set copies=3D2/3 just in case some errors occur, so ZFS = can auto0repair the data. if you run ZFS over several LUNs this will = make even more sense.=20 -- Paul Kraus Deputy Technical Director, LoneStarCon 3 Sound Coordinator, Schenectady Light Opera Company
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A9599DD7-1A32-4607-BC83-2E6E4D03C560>