Date: Thu, 7 Aug 2014 10:35:36 +0200 (CEST) From: =?ISO-8859-1?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no> To: Scott Bennett <bennett@sdf.org> Cc: freebsd@qeng-ho.org, freebsd-questions@freebsd.org Subject: Re: gvinum raid5 vs. ZFS raidz Message-ID: <alpine.BSF.2.11.1408071034510.64214@mail.fig.ol.no> In-Reply-To: <201408070831.s778VhJc015365@sdf.org> References: <201408020621.s726LsiA024208@sdf.org> <alpine.BSF.2.11.1408020356250.1128@wonkity.com> <53DCDBE8.8060704@qeng-ho.org> <201408060556.s765uKJA026937@sdf.org> <53E1FF5F.1050500@qeng-ho.org> <201408070831.s778VhJc015365@sdf.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 7 Aug 2014 03:31-0500, Scott Bennett wrote: > Arthur Chance <freebsd@qeng-ho.org> wrote: > > > On 06/08/2014 06:56, Scott Bennett wrote: > > > Arthur Chance <freebsd@qeng-ho.org> wrote: > > >> > > >> [stuff deleted --SB] > > > I wonder if what varies is the amount of space taken up by the > > > checksums. If there's a checksum for each block, then the block size > > > would change the fraction of the space lost to checksums, and the parity > > > for the checksums would thus also change. Enough to matter? Maybe. > > > > I'm not a file system guru, but my (high level) understanding is as > > follows. Corrections from anyone more knowledgeable welcome. > > > > 1. UFS and ZFS both use tree structures to represent files, with the > > data stored at the leaves and bookkeeping stored in the higher nodes. > > Therefore the overhead scales as the log of the data size, which is a > > negligible fraction for any sufficiently large amount of data. > > > > 2. UFS doesn't have data checksums, it relies purely on the hardware > > checksums. (This is the area I'm least certain of.) > > What hardware checksums are there? I wasn't aware that this sort of > hardware kept any. To quote http://en.wikipedia.org/wiki/Disk_sector: In disk drives, each physical sector is made up of three basic parts, the sector header, the data area and the error-correcting code (ECC). > > 3. ZFS keeps its checksums in a Merkel tree > > (http://en.wikipedia.org/wiki/Merkle_tree) so the checksums are held in > > the bookkeeping blocks, not in the data blocks. This simply changes the > > constant multiplier in front of the logarithm for the overhead. Also, I > > believe ZFS doesn't use fixed size data blocks, but aggregates writes > > into blocks of up to 128K. > > > > Personally, I don't worry about the overheads of checksumming as the > > cost of the parity stripe(s) in raidz is dominant. It's a cost well > > worth paying though - I have a 3 disk raidz1 pool and a disk went bad > > within 3 months of building it (the manufacturer turned out to be having > > a few problems at the time) but I didn't lose a byte. > > > Good testimonial. I'm not worried about the checksum space either. > I figure the benefits make it cheap at the price. Of more concern to me > now is how I'm going to come up with at least two more 2 TB drives to set > up a raidz2 with a tolerably small fraction of the total space being tied > up in combined ZFS overhead (i.e., bookkeeping, parity, checksums, etc.) > > > Scott Bennett, Comm. ASMELG, CFIAG > ********************************************************************** > * Internet: bennett at sdf.org *xor* bennett at freeshell.org * > *--------------------------------------------------------------------* > * "A well regulated and disciplined militia, is at all times a good * > * objection to the introduction of that bane of all free governments * > * -- a standing army." * > * -- Gov. John Hancock, New York Journal, 28 January 1790 * > ********************************************************************** -- +-------------------------------+------------------------------------+ | Vennlig hilsen, | Best regards, | | Trond Endrestøl, | Trond Endrestøl, | | IT-ansvarlig, | System administrator, | | Fagskolen Innlandet, | Gjøvik Technical College, Norway, | | tlf. mob. 952 62 567, | Cellular...: +47 952 62 567, | | sentralbord 61 14 54 00. | Switchboard: +47 61 14 54 00. | +-------------------------------+------------------------------------+ From owner-freebsd-questions@FreeBSD.ORG Thu Aug 7 09:37:51 2014 Return-Path: <owner-freebsd-questions@FreeBSD.ORG> Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 168E9CA3 for <freebsd-questions@freebsd.org>; Thu, 7 Aug 2014 09:37:51 +0000 (UTC) Received: from sdf.lonestar.org (mx.sdf.org [192.94.73.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx.sdf.org", Issuer "SDF.ORG" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E76062CF4 for <freebsd-questions@freebsd.org>; Thu, 7 Aug 2014 09:37:50 +0000 (UTC) Received: from sdf.org (IDENT:bennett@sdf.lonestar.org [192.94.73.15]) by sdf.lonestar.org (8.14.8/8.14.5) with ESMTP id s779alOS002276 (using TLSv1/SSLv3 with cipher DHE-RSA-AES256-GCM-SHA384 (256 bits) verified NO); Thu, 7 Aug 2014 09:36:48 GMT Received: (from bennett@localhost) by sdf.org (8.14.8/8.12.8/Submit) id s779akMv017524; Thu, 7 Aug 2014 04:36:46 -0500 (CDT) From: Scott Bennett <bennett@sdf.org> Message-Id: <201408070936.s779akMv017524@sdf.org> Date: Thu, 07 Aug 2014 04:36:46 -0500 To: Trond.Endrestol@fagskolen.gjovik.no Subject: Re: gvinum raid5 vs. ZFS raidz References: <201408020621.s726LsiA024208@sdf.org> <alpine.BSF.2.11.1408020356250.1128@wonkity.com> <53DCDBE8.8060704@qeng-ho.org> <201408060556.s765uKJA026937@sdf.org> <53E1FF5F.1050500@qeng-ho.org> <201408070831.s778VhJc015365@sdf.org> <alpine.BSF.2.11.1408071034510.64214@mail.fig.ol.no> In-Reply-To: <alpine.BSF.2.11.1408071034510.64214@mail.fig.ol.no> User-Agent: Heirloom mailx 12.4 7/29/08 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd@qeng-ho.org, freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: User questions <freebsd-questions.freebsd.org> List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/> List-Post: <mailto:freebsd-questions@freebsd.org> List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help> List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, <mailto:freebsd-questions-request@freebsd.org?subject=subscribe> X-List-Received-Date: Thu, 07 Aug 2014 09:37:51 -0000 Trond Endrest?l <Trond.Endrestol@fagskolen.gjovik.no> wrote: > On Thu, 7 Aug 2014 03:31-0500, Scott Bennett wrote: > > Arthur Chance <freebsd@qeng-ho.org> wrote: > > > On 06/08/2014 06:56, Scott Bennett wrote: > > > > Arthur Chance <freebsd@qeng-ho.org> wrote: > > > >> > > > >> [stuff deleted --SB] > > > > I wonder if what varies is the amount of space taken up by the > > > > checksums. If there's a checksum for each block, then the block size > > > > would change the fraction of the space lost to checksums, and the parity > > > > for the checksums would thus also change. Enough to matter? Maybe. > > > > > > I'm not a file system guru, but my (high level) understanding is as > > > follows. Corrections from anyone more knowledgeable welcome. > > > > > > 1. UFS and ZFS both use tree structures to represent files, with the > > > data stored at the leaves and bookkeeping stored in the higher nodes. > > > Therefore the overhead scales as the log of the data size, which is a > > > negligible fraction for any sufficiently large amount of data. > > > > > > 2. UFS doesn't have data checksums, it relies purely on the hardware > > > checksums. (This is the area I'm least certain of.) > > > > What hardware checksums are there? I wasn't aware that this sort of > > hardware kept any. > > To quote http://en.wikipedia.org/wiki/Disk_sector: > > In disk drives, each physical sector is made up of three basic parts, > the sector header, the data area and the error-correcting code (ECC). That's interesting, and I know it was true in the days of minicomputers. However, it appears to be out of date, based upon 1) the observed fact that corrupted data *do* get recorded onto today's PC-style disk drives with no indication that an error has occurred, no parity bits are present in the processor chips, memory cards, motherboards, PATA/SATA/SCSI/etc. controllers, nor 2) the disk drives themselves, as confirmed by the technical support guy I spoke with about it at Seagate/Samsung recently. That guy said that there is *no parity-checking* of data written to/read from the disks and that some silent errors are now considered to be "normal" on disks whose capacities exceed 1 TB. > > [remainder deleted --SB] Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.11.1408071034510.64214>