From owner-freebsd-questions@FreeBSD.ORG Sat May 18 14:00:25 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E9ACAB69 for ; Sat, 18 May 2013 14:00:25 +0000 (UTC) (envelope-from Ivailo.Tanusheff@skrill.com) Received: from co9outboundpool.messaging.microsoft.com (co9ehsobe004.messaging.microsoft.com [207.46.163.27]) by mx1.freebsd.org (Postfix) with ESMTP id B17BC226 for ; Sat, 18 May 2013 14:00:24 +0000 (UTC) Received: from mail70-co9-R.bigfish.com (10.236.132.244) by CO9EHSOBE008.bigfish.com (10.236.130.71) with Microsoft SMTP Server id 14.1.225.23; Sat, 18 May 2013 13:30:07 +0000 Received: from mail70-co9 (localhost [127.0.0.1]) by mail70-co9-R.bigfish.com (Postfix) with ESMTP id 6375CA012F; Sat, 18 May 2013 13:30:07 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.249.213; KIP:(null); UIP:(null); IPV:NLI; H:AM2PRD0710HT004.eurprd07.prod.outlook.com; RD:none; EFVD:NLI X-SpamScore: -2 X-BigFish: PS-2(zz98dI9371I542I1432Izz1f42h1ee6h1de0h1fdah1202h1e76h1d1ah1d2ah1fc6hzz17326ah8275bh8275dhz2fh2a8h668h839h944hd24hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d07h1d0ch1d2eh1d3fh9a9j1155h) Received-SPF: pass (mail70-co9: domain of skrill.com designates 157.56.249.213 as permitted sender) client-ip=157.56.249.213; envelope-from=Ivailo.Tanusheff@skrill.com; helo=AM2PRD0710HT004.eurprd07.prod.outlook.com ; .outlook.com ; X-Forefront-Antispam-Report-Untrusted: SFV:SKI; SFS:; DIR:OUT; SFP:; SCL:-1; SRVR:DB3PR07MB059; H:DB3PR07MB059.eurprd07.prod.outlook.com; LANG:en; Received: from mail70-co9 (localhost.localdomain [127.0.0.1]) by mail70-co9 (MessageSwitch) id 1368883804743308_23304; Sat, 18 May 2013 13:30:04 +0000 (UTC) Received: from CO9EHSMHS001.bigfish.com (unknown [10.236.132.245]) by mail70-co9.bigfish.com (Postfix) with ESMTP id A9BE5340073; Sat, 18 May 2013 13:30:04 +0000 (UTC) Received: from AM2PRD0710HT004.eurprd07.prod.outlook.com (157.56.249.213) by CO9EHSMHS001.bigfish.com (10.236.130.11) with Microsoft SMTP Server (TLS) id 14.1.225.23; Sat, 18 May 2013 13:30:04 +0000 Received: from DB3PR07MB059.eurprd07.prod.outlook.com (10.242.137.149) by AM2PRD0710HT004.eurprd07.prod.outlook.com (10.255.165.39) with Microsoft SMTP Server (TLS) id 14.16.311.1; Sat, 18 May 2013 13:30:00 +0000 Received: from DB3PR07MB059.eurprd07.prod.outlook.com (10.242.137.149) by DB3PR07MB059.eurprd07.prod.outlook.com (10.242.137.149) with Microsoft SMTP Server (TLS) id 15.0.698.13; Sat, 18 May 2013 13:29:59 +0000 Received: from DB3PR07MB059.eurprd07.prod.outlook.com ([169.254.2.144]) by DB3PR07MB059.eurprd07.prod.outlook.com ([169.254.2.144]) with mapi id 15.00.0698.010; Sat, 18 May 2013 13:29:59 +0000 From: Ivailo Tanusheff To: Paul Kraus Subject: RE: ZFS install on a partition Thread-Topic: ZFS install on a partition Thread-Index: AQHOU01bsoTCuF7L+0CRx2p5cPppmJkKhktwgABi8ICAAAP+YA== Date: Sat, 18 May 2013 13:29:58 +0000 Message-ID: References: <372082cab2064846809615a8073e022c@DB3PR07MB059.eurprd07.prod.outlook.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [217.18.249.148] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: skrill.com Cc: Liste FreeBSD X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 May 2013 14:00:26 -0000 The software RAID depends not only from the disks, but also from the change= s on the OS, which will occur more frequently than an update of the firmwar= e of the raid controller. So that makes the hardware raid more stable and r= eliable. Also the resources of the hardware raid are exclusively used by the raid co= ntroller, which is not true for a software raid. So I do not get your point of appointing that a software raid is same/bette= r than the hardware one. About the second part - I point over both stability and reliability. Having= a spare disk reduces the risk as the recovery operation will start as soon= as a disk fails. It may sound paranoid, but still the possibility of a fai= ling disk which is detected after 8, 12 or even 24 hours is pretty big. Not sure about your calculations, hope you trust them, but in my previous c= ompany we have a 3-4 months period when a disk fails almost every day on 2 = year old servers, so trust me - I do NOT trust those calculations, as I've = seen the opposite. Maybe it was a failed batch of disk, shipped in the coun= try, but no one is insured against this. Yes, you can use several hot spare= s on the software raid, but: 1. You still depend on the problems, related to the OS. 2. If you read what the mate asking has written - you will see that is not = possible for him. I agree on the mentioned about recovering bid chunks of data, that's why I = suggested that he uses several smaller LUNs for the zpool. Best regards, Ivailo Tanusheff -----Original Message----- From: owner-freebsd-questions@freebsd.org [mailto:owner-freebsd-questions@f= reebsd.org] On Behalf Of Paul Kraus Sent: Saturday, May 18, 2013 4:02 PM To: Ivailo Tanusheff Cc: Liste FreeBSD Subject: Re: ZFS install on a partition On May 18, 2013, at 3:21 AM, Ivailo Tanusheff = wrote: > If you use HBA/JBOD then you will rely on the software RAID of the ZFS sy= stem. Yes, this RAID is good, but unless you use SSD disks to boost perform= ance and a lot of RAM the hardware raid should be more reliable and mush fa= ster. Why will the hardware raid be more reliable ? While hardware raid is susce= ptible to uncorrectable errors from the physical drives (hardware raid cont= rollers rely on the drives to report bad reads and writes), and the uncorre= ctable error rate for modern drives is such that with high capacity drives = (1TB and over) you are almost certain to run into a couple over the operati= onal life of the drive. 10^-14 for cheap drives and 10^-15 for better drive= s, very occasionally I see a drive rated for 10^-16. Run the math and see h= ow many TB worth of data you have to write and read (remember these failure= s are generally read failures with NO indication that a failure occurred, b= ad data is just returned to the system). In terms of performance HW raid is faster, generally due to the cache RAM = built into the HW raid controller. ZFS makes good use of system, RAM for th= e same function. An SSD can help with performance if the majority of writes= are sync (NFS is a good example of this) or if you can benefit from a much= larger read cache. SSDs are deployed with ZFS as either write LOG devices = (in which case they should be mirrored), but they only come into play for S= YNC writes; and as an extension of the ARC, the L2ARC, which does not have = to be mirrored as it is only a cache of existing data for spying up reads. > I didn't get if you want to use the system to dual boot Linux/FreeBSD or = just to share FreeBSD space with linux. > But I would advise you to go with option 1 - you will get most of the sys= tem and obviously you don't need zpool with raid, as your LSI controller wi= ll do all the redundancy for you. Making software RAID over the hardware on= e will only decrease performance and will NOT increase the reliability, as = you will not be sure which information is stored on which physical disk. >=20 > If stability is a MUST, then I will also advise you to go with bunch of p= ools and a disk designated as hot spare - in case some disk dies you will r= ely on the automation recovery. Also you should run monitoring tool on your= raid controller. I think you misunderstand the difference between stability and reliability= . Any ZFS configuration I have tried on FreeBSD is STABLE, having redundant= vdevs (mirrors or RAIDz) along with hot spares can increase RELIABILITY= . The only advantage to having a hot spare is that when a drive fails (and = they all fail eventually), the REPLACE operation can start immediately with= out you noticing and manually replacing the failed drive. Reliability is a combination of reduction in MTBF (mean time between failu= re) and MTTR (mean time to repair). Having a hot spare reduces the MTTR. Th= e other way to improve MTTR is to go with smaller drives to recede the time= it takes the system to resilver a failed drive. This is NOT applicable in = the OP's situation. I try very hard not so use drives larger than 1TB becau= se resilver times can be days. Resilver time also depends on the total size= of the the data in a zpool, as a resolver operation walks the FS in time, = replaying all the writes and confirming that all the data on disk is good (= it does not actually rewrite the data unless it finds bad data). This means= a couple things, the first of which is that the resilver time will be depe= ndent on the amount of data you have written, not the capacity. A zppol wit= h a capacity of multiple TB will resilver in seconds if there is only a few= hundred MB written to it. Since the resilver operation is not just a block= by block copy, but a replay, it is I/Ops limited not bandwidth limited. You might be abl= e to stream sequential data from a drive at hundreds of MB/sec., but most S= ATA drives will not sustain more than one to two hundred RANDOM I/Ops (sequ= entially they can do much more). > You can also set copies=3D2/3 just in case some errors occur, so ZFS can = auto0repair the data. if you run ZFS over several LUNs this will make even = more sense.=20 -- Paul Kraus Deputy Technical Director, LoneStarCon 3 Sound Coordinator, Schenectady Lig= ht Opera Company _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman= /listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org= "