From owner-freebsd-questions@FreeBSD.ORG Sat May 18 13:02:20 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D8A90662 for ; Sat, 18 May 2013 13:02:20 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: from mail-yh0-x22c.google.com (mail-yh0-x22c.google.com [IPv6:2607:f8b0:4002:c01::22c]) by mx1.freebsd.org (Postfix) with ESMTP id 9FE2679 for ; Sat, 18 May 2013 13:02:18 +0000 (UTC) Received: by mail-yh0-f44.google.com with SMTP id 29so1255529yhl.17 for ; Sat, 18 May 2013 06:02:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:subject:mime-version:content-type:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=1gYoGVHCxNCcVWMivZKdbxbJndA1MAqvX1P89VzfVsE=; b=mcmlU6sVsH6XJCeIez16Rn8TvJ49hG4RQRL/i3WAfZn6K300F5z4rRw00Nk31GgmZ/ I1YVJuaUX4z7Jq3DSr1YUcB51wwfQTRUxRqKwI8F3Ty+d1RB/THxpiED1QOA8QrkCw94 slGI5M2CAqPKK2xI3r9rtQMa/9WLV/MA/9tu65KVjt17LcGuLsYKR1ujp59D3MrcwoN5 Rkrd28+R7O+u+YXl3QCSoWOZOFjdd6ET/3BJ4MvVZL6sD0h+cfZa4NA/s05Wg1b21C1V 3dF/9BufeO7dZK4nrdvynv8lh0RhJS86TkZWwAcgC5BYDU69tzPS8Ppw04SNVLGnk6Xw SonQ== X-Received: by 10.236.29.202 with SMTP id i50mr29867991yha.82.1368882138481; Sat, 18 May 2013 06:02:18 -0700 (PDT) Received: from [192.168.2.66] ([96.236.21.119]) by mx.google.com with ESMTPSA id t41sm24723795yhh.11.2013.05.18.06.02.17 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 18 May 2013 06:02:17 -0700 (PDT) Subject: Re: ZFS install on a partition Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Content-Type: text/plain; charset=us-ascii From: Paul Kraus In-Reply-To: <372082cab2064846809615a8073e022c@DB3PR07MB059.eurprd07.prod.outlook.com> Date: Sat, 18 May 2013 09:02:15 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <372082cab2064846809615a8073e022c@DB3PR07MB059.eurprd07.prod.outlook.com> To: Ivailo Tanusheff X-Mailer: Apple Mail (2.1503) X-Gm-Message-State: ALoCoQkN1fSLqmWIm2+Qc/Ot2k4QpClWY0K7KayJuwEUCNGD2mGhW7HQftTFcKGG/+mxhw5uDrvh Cc: Liste FreeBSD X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 May 2013 13:02:20 -0000 On May 18, 2013, at 3:21 AM, Ivailo Tanusheff = wrote: > If you use HBA/JBOD then you will rely on the software RAID of the ZFS = system. Yes, this RAID is good, but unless you use SSD disks to boost = performance and a lot of RAM the hardware raid should be more reliable = and mush faster. Why will the hardware raid be more reliable ? While hardware = raid is susceptible to uncorrectable errors from the physical drives = (hardware raid controllers rely on the drives to report bad reads and = writes), and the uncorrectable error rate for modern drives is such that = with high capacity drives (1TB and over) you are almost certain to run = into a couple over the operational life of the drive. 10^-14 for cheap = drives and 10^-15 for better drives, very occasionally I see a drive = rated for 10^-16. Run the math and see how many TB worth of data you = have to write and read (remember these failures are generally read = failures with NO indication that a failure occurred, bad data is just = returned to the system). In terms of performance HW raid is faster, generally due to the = cache RAM built into the HW raid controller. ZFS makes good use of = system, RAM for the same function. An SSD can help with performance if = the majority of writes are sync (NFS is a good example of this) or if = you can benefit from a much larger read cache. SSDs are deployed with = ZFS as either write LOG devices (in which case they should be mirrored), = but they only come into play for SYNC writes; and as an extension of the = ARC, the L2ARC, which does not have to be mirrored as it is only a cache = of existing data for spying up reads. > I didn't get if you want to use the system to dual boot Linux/FreeBSD = or just to share FreeBSD space with linux. > But I would advise you to go with option 1 - you will get most of the = system and obviously you don't need zpool with raid, as your LSI = controller will do all the redundancy for you. Making software RAID over = the hardware one will only decrease performance and will NOT increase = the reliability, as you will not be sure which information is stored on = which physical disk. >=20 > If stability is a MUST, then I will also advise you to go with bunch = of pools and a disk designated as hot spare - in case some disk dies you = will rely on the automation recovery. Also you should run monitoring = tool on your raid controller. I think you misunderstand the difference between stability and = reliability. Any ZFS configuration I have tried on FreeBSD is STABLE, = having redundant vdevs (mirrors or RAIDz) along with hot spares can = increase RELIABILITY. The only advantage to having a hot spare is that = when a drive fails (and they all fail eventually), the REPLACE operation = can start immediately without you noticing and manually replacing the = failed drive. Reliability is a combination of reduction in MTBF (mean time = between failure) and MTTR (mean time to repair). Having a hot spare = reduces the MTTR. The other way to improve MTTR is to go with smaller = drives to recede the time it takes the system to resilver a failed = drive. This is NOT applicable in the OP's situation. I try very hard not = so use drives larger than 1TB because resilver times can be days. = Resilver time also depends on the total size of the the data in a zpool, = as a resolver operation walks the FS in time, replaying all the writes = and confirming that all the data on disk is good (it does not actually = rewrite the data unless it finds bad data). This means a couple things, = the first of which is that the resilver time will be dependent on the = amount of data you have written, not the capacity. A zppol with a = capacity of multiple TB will resilver in seconds if there is only a few = hundred MB written to it. Since the resilver operation is not just a = block by block copy, but a replay, it is I/Ops limited not bandwidth = limited. You might be able to stream sequential data from a drive at = hundreds of MB/sec., but most SATA drives will not sustain more than one = to two hundred RANDOM I/Ops (sequentially they can do much more). > You can also set copies=3D2/3 just in case some errors occur, so ZFS = can auto0repair the data. if you run ZFS over several LUNs this will = make even more sense.=20 -- Paul Kraus Deputy Technical Director, LoneStarCon 3 Sound Coordinator, Schenectady Light Opera Company