From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 11:02:04 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54B5F106564A; Sun, 28 Jun 2009 11:02:04 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.245]) by mx1.freebsd.org (Postfix) with ESMTP id E79F38FC08; Sun, 28 Jun 2009 11:02:03 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: by an-out-0708.google.com with SMTP id d14so933252and.13 for ; Sun, 28 Jun 2009 04:02:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=+YkaB5f87hEajaJGUm2wtGXumnLWaCI1qwMrNc3VXcA=; b=ejs+AZ276WHeCBZ1UOnLYMAvEoAaPK7x4MgVmARF3je7HteCDeQsiDfijg8Z2giShY GptBwHQAfQilzjS7m9B6m2h53dEZXa7wWWozcdVZdacJb5NO95iPWCk2LglpUIYMYAs9 DrCWs0PFOFuztf5PEAJvoucGobv6UPQXiqZuQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=wwxu82stT2ouvAjW5jf6ElNZiSSc+r2xKBCqTFOhwPbfBjfZRlEMiLPHA0vcjL3JxT F9paT4kddeJjH7yK4u9/kvc1qC6xILB4yL1lNUQEHYGcy9+vDqJKK1rAWcwtr5aAGmdN j3ajhaN6e1gaPP10zEe9PBDhNaUPdMObsf6qM= MIME-Version: 1.0 Received: by 10.100.11.14 with SMTP id 14mr7540531ank.81.1246186923267; Sun, 28 Jun 2009 04:02:03 -0700 (PDT) In-Reply-To: <4A4747A0.6040902@modulus.org> References: <4A4725FA.80505@modulus.org> <4A4747A0.6040902@modulus.org> Date: Sun, 28 Jun 2009 14:02:03 +0300 Message-ID: From: Dan Naumov To: Andrew Snow Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org Subject: Re: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 11:02:04 -0000 "Now we come to the crucial decision ZFS has made for raidz and raidz2: in raidz and raidz2, the data block is striped across all of the disks. Instead of a model where a parity stripe is a bunch of data blocks, each with an independent checksum, ZFS stripes a single data block (and its parity), with a single checksum, across all the disks (or as many of them as necessary). This is a rational implementation decision, but when combined with the need to verify checksums, it has an important consequence: in ZFS, reads always involve all disks, because ZFS always must verify the data block's checksum, which requires reading all of the data block, which is spread across all of the drives. This is unlike normal RAID-5 or RAID-6, in which a small enough read will only touch one drive, and means that adding more disks to a ZFS raidz pool does not increase how many random reads you can do per second. (A normal RAID-5 or RAID-6 array has a (theoretical) random read IO capacity equal to the sum of the random IO operations rate of each of the disks in the array, and so adding another disk adds its IOPs per second to your read capacity. A ZFS raidz or raidz2 pool instead has a capacity equal to the slowest disk's IOPs per second, and adding another disk does nothing to help. Effectively a raidz ZFS gives you a single disk's read IOPs per second rate.)" This was on a blog of a SUN engineer (although a post from a few years ago), unfortunately I don't have the link, I actually had to go through my posting history on the Ars Technica forum to even find this quote in the first place. If the situation has changed and the above quote no longer holds true, it would be nice if someone more knowledgeable on the performance implications could elaborate what kind of performance is to be expected on a raidz system :) - Sincerely, Dan Naumov On Sun, Jun 28, 2009 at 1:36 PM, Andrew Snow wrote: >> What's confusing is that your results are actually out of place with >> how ZFS numbers are supposed to look, not mine :) When using ZFS >> RAIDZ, due to the way parity checking works in ZFS, your pool is >> SUPPOSED to have throughput of the average single disk from that pool >> and not some numbers growing skyhigh in a linear fashion. > > Could you please elaborate on this and explain it? > > - Andrew >