From owner-freebsd-fs@FreeBSD.ORG  Sun Jun 28 11:02:04 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 54B5F106564A;
	Sun, 28 Jun 2009 11:02:04 +0000 (UTC)
	(envelope-from dan.naumov@gmail.com)
Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.245])
	by mx1.freebsd.org (Postfix) with ESMTP id E79F38FC08;
	Sun, 28 Jun 2009 11:02:03 +0000 (UTC)
	(envelope-from dan.naumov@gmail.com)
Received: by an-out-0708.google.com with SMTP id d14so933252and.13
	for <multiple recipients>; Sun, 28 Jun 2009 04:02:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=+YkaB5f87hEajaJGUm2wtGXumnLWaCI1qwMrNc3VXcA=;
	b=ejs+AZ276WHeCBZ1UOnLYMAvEoAaPK7x4MgVmARF3je7HteCDeQsiDfijg8Z2giShY
	GptBwHQAfQilzjS7m9B6m2h53dEZXa7wWWozcdVZdacJb5NO95iPWCk2LglpUIYMYAs9
	DrCWs0PFOFuztf5PEAJvoucGobv6UPQXiqZuQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=wwxu82stT2ouvAjW5jf6ElNZiSSc+r2xKBCqTFOhwPbfBjfZRlEMiLPHA0vcjL3JxT
	F9paT4kddeJjH7yK4u9/kvc1qC6xILB4yL1lNUQEHYGcy9+vDqJKK1rAWcwtr5aAGmdN
	j3ajhaN6e1gaPP10zEe9PBDhNaUPdMObsf6qM=
MIME-Version: 1.0
Received: by 10.100.11.14 with SMTP id 14mr7540531ank.81.1246186923267; Sun, 
	28 Jun 2009 04:02:03 -0700 (PDT)
In-Reply-To: <4A4747A0.6040902@modulus.org>
References: <cf9b1ee00906261636m5d09966ag6d7e1b7557ada709@mail.gmail.com>
	<4A4725FA.80505@modulus.org>
	<cf9b1ee00906280330s1f500266xdcbfb1462deda7f8@mail.gmail.com>
	<4A4747A0.6040902@modulus.org>
Date: Sun, 28 Jun 2009 14:02:03 +0300
Message-ID: <cf9b1ee00906280402g40dcd4b2p81dbf18612495d02@mail.gmail.com>
From: Dan Naumov <dan.naumov@gmail.com>
To: Andrew Snow <andrew@modulus.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org
Subject: Re: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs 
	Linux MDRAID
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Jun 2009 11:02:04 -0000

"Now we come to the crucial decision ZFS has made for raidz and
raidz2: in raidz and raidz2, the data block is striped across all of
the disks. Instead of a model where a parity stripe is a bunch of data
blocks, each with an independent checksum, ZFS stripes a single data
block (and its parity), with a single checksum, across all the disks
(or as many of them as necessary).

This is a rational implementation decision, but when combined with the
need to verify checksums, it has an important consequence: in ZFS,
reads always involve all disks, because ZFS always must verify the
data block's checksum, which requires reading all of the data block,
which is spread across all of the drives. This is unlike normal RAID-5
or RAID-6, in which a small enough read will only touch one drive, and
means that adding more disks to a ZFS raidz pool does not increase how
many random reads you can do per second.

(A normal RAID-5 or RAID-6 array has a (theoretical) random read IO
capacity equal to the sum of the random IO operations rate of each of
the disks in the array, and so adding another disk adds its IOPs per
second to your read capacity. A ZFS raidz or raidz2 pool instead has a
capacity equal to the slowest disk's IOPs per second, and adding
another disk does nothing to help. Effectively a raidz ZFS gives you a
single disk's read IOPs per second rate.)"


This was on a blog of a SUN engineer (although a post from a few years
ago), unfortunately I don't have the link, I actually had to go
through my posting history on the Ars Technica forum to even find this
quote in the first place. If the situation has changed and the above
quote no longer holds true, it would be nice if someone more
knowledgeable on the performance implications could elaborate what
kind of performance is to be expected on a raidz system :)

- Sincerely,
Dan Naumov




On Sun, Jun 28, 2009 at 1:36 PM, Andrew Snow<andrew@modulus.org> wrote:
>> What's confusing is that your results are actually out of place with
>> how ZFS numbers are supposed to look, not mine :) When using ZFS
>> RAIDZ, due to the way parity checking works in ZFS, your pool is
>> SUPPOSED to have throughput of the average single disk from that pool
>> and not some numbers growing skyhigh in a linear fashion.
>
> Could you please elaborate on this and explain it?
>
> - Andrew
>