From owner-freebsd-fs@FreeBSD.ORG Wed Sep 15 08:45:18 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3240B1065674 for ; Wed, 15 Sep 2010 08:45:18 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id B74AF8FC08 for ; Wed, 15 Sep 2010 08:45:17 +0000 (UTC) Received: from mail.cicely.de ([10.1.1.37]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id o8F8jFQU016807 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 15 Sep 2010 10:45:15 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by mail.cicely.de (8.14.4/8.14.4) with ESMTP id o8F8j1X8061884 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 15 Sep 2010 10:45:01 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id o8F8j1wP022926; Wed, 15 Sep 2010 10:45:01 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id o8F8j1U5022925; Wed, 15 Sep 2010 10:45:01 +0200 (CEST) (envelope-from ticso) Date: Wed, 15 Sep 2010 10:45:01 +0200 From: Bernd Walter To: Chris Watson Message-ID: <20100915084501.GF17282@cicely7.cicely.de> References: <82EA2358-F5E5-4CEE-91AC-4211C04F22FD@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <82EA2358-F5E5-4CEE-91AC-4211C04F22FD@gmail.com> X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED=-1, BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01 autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de Cc: freebsd-fs@freebsd.org Subject: Re: ZFS I/O Throughput question.. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 08:45:18 -0000 On Wed, Sep 15, 2010 at 03:05:46AM -0500, Chris Watson wrote: > I have been testing ZFS on a home box now for a few days and I have a > question that is perplexing me. Everything I have read on ZFS says in > almost every case mirroring is faster than raidz. So I initially setup > a 2x2 Raid 10 striped mirror. Like so: > > priyanka# zpool status > pool: tank > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > > errors: No known data errors > priyanka# > > With this configuration I am getting the following throughput for reads: > > priyanka# dd if=/dev/zero of=/tank/Aperture/test01 bs=1m count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes transferred in 98.533820 secs (106417878 bytes/sec) > priyanka# > > And for reads: > > priyanka# dd if=/tank/Aperture/test01 of=/dev/null bs=1m > 10000+0 records in > 10000+0 records out > 10485760000 bytes transferred in 50.309988 secs (208423027 bytes/sec) > priyanka# > > So basically 100MB/writes, 200MB/reads. Not surprising - two disks in parallel are used to write data. Probably it might have been layed out over the stripe set, so that actually twice the number of disks could have been used, but this optimization for single linear file access is bad for random performance, since you need to seek all drives. > I thought the disks I have would do a little better than that assuming > from much of the zfs literature proclaiming mirroring to be fastest > with more I/O and more OPS/sec. Well I decided to blow away the mirror > and instead do a 4 disk raidz to see just how much faster mirroring > was with ZFS vs raidz. This is where I was blown away and more than a > little confused. > > priyanka# zpool status > pool: tank > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > > errors: No known data errors > priyanka# > > Write performance: > > priyanka# dd if=/dev/zero of=/tank/test.001 bs=1m count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes transferred in 34.310930 secs (305609903 bytes/sec) > priyanka# You basicly have 3 drives to write too - the parity disk writes redundand data, so it doesn't add to the bandwidth. > Read performance: > > priyanka# dd if=/tank/test.001 of=/dev/null bs=1m count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes transferred in 31.463025 secs (333272467 bytes/sec) > priyanka# Now you have 4 drives to read from. The problem however is that you seek all four drives. But you get the same pessimisation for random access as if your mirror would have been used spreading data over all disks. The only difference is that with a single raidz you don't have a choice anymore. > Say whaaaaaat?! Perhaps I am completely misunderstanding every zfs > admin guide, FAQ and paper on ZFS. But everything I have read says > mirroring should be much faster than a raidz and should almost always > be preferred. Which clearly from above is not the case. The only thing > I can think of is that the dd "benchmark" is not accurate because it > is writing data sequentially? Which is the place raidz has an edge > over mirroring, again from what I have read. But the above is not so > much an 'edge' in performance as much as a complete and total data > rape. So my question is, is everything i've read about ZFS and > mirroring vs raidz wrong? Is the benchmark horribly flawed? Is raidz > actually faster versus mirroring? Does FreeBSD perform some kind of > voodoo h0h0magic that makes raidz perform much better than mirroring > in ZFS than other platforms? Or am I just having a really weird dream > and none of this is real. That's exactly the point - your dd benchmark only tests a very specific case, whichin fact might match your application, but in almost every use case you access multiple files at the same time and then it is good to seek drives independly. Just repeat the same test with two files written/read at the same time and you should easily see a major difference. You should also note that all the cases where linear reads are faster than a single drive only works because of very agressive prereading. The faster your drives are and the more drives you have the prereading must be more agressive to still get a win - in the 4 disk raidz read case you already seem to have reached some kind of limitation. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.