From owner-freebsd-stable@FreeBSD.ORG Sat May 2 00:05:51 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4024910656C1 for ; Sat, 2 May 2009 00:05:51 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164]) by mx1.freebsd.org (Postfix) with ESMTP id 16B1A8FC0C for ; Sat, 2 May 2009 00:05:50 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from localhost (localhost [127.0.0.1]) by mx.egr.msu.edu (Postfix) with ESMTP id 6517371F023; Fri, 1 May 2009 19:46:20 -0400 (EDT) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mx.egr.msu.edu ([127.0.0.1]) by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g31-2WVPIgpn; Fri, 1 May 2009 19:46:20 -0400 (EDT) Received: from localhost (daemon.egr.msu.edu [35.9.44.65]) by mx.egr.msu.edu (Postfix) with ESMTP id 216A971EF9E; Fri, 1 May 2009 19:46:20 -0400 (EDT) Received: by localhost (Postfix, from userid 21281) id 1EE0460A; Fri, 1 May 2009 19:46:20 -0400 (EDT) Date: Fri, 1 May 2009 19:46:20 -0400 From: Adam McDougall To: Mike Tancsa Message-ID: <20090501234619.GZ574@egr.msu.edu> References: <200905012041.n41Kf47B045440@lava.sentex.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200905012041.n41Kf47B045440@lava.sentex.ca> User-Agent: Mutt/1.5.19 (2009-01-05) Cc: freebsd-stable@freebsd.org Subject: Re: current zfs tuning in RELENG_7 (AMD64) suggestions ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 May 2009 00:05:51 -0000 On Fri, May 01, 2009 at 04:42:09PM -0400, Mike Tancsa wrote: I gave the AMD64 version of 7.2 RC2 a spin and all installed as expected off the dvd INTEL S3200SHV MB, Core2Duo, 4G of RAM The writes are all within the normal variance of the tests except for b). Is there anything else that should be tuned ? Not that I am looking for any "magic bullets" but I just want to run this backup server as best as possible -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU a 5000 98772 54.7 153111 31.5 100015 21.1 178730 85.3 368782 32.5 161.6 0.6 b 5000 101271 57.9 154765 31.5 61325 13.9 176741 84.6 372477 32.8 149.3 0.6 c 5000 102331 57.1 159559 29.5 105767 17.4 144410 63.8 299317 19.9 167.9 0.6 d 5000 107308 58.6 175004 32.4 117926 18.8 143657 63.4 305126 20.0 167.6 0.6 You might want to try running gstat -I 100000 during the test to see how fast each drive flushes the cache from ram and if there are any disks slower than the others. I've found some cards or slots cause drives to perform slower than other drives in the system, dragging down performance of the raid to the slowest drive(s). Individual performance testing of the drives outside of the raid might reveal something too, even just to find out what the maximum sequential speed of one drive is so you know that 4x(speed) is the best to hope for in raid tests. ZFS tends to cache heavily at the start of each write and you will probably see it bounce between no IO and furious writes, until the ram cache fills up more and it has no choice but to write almost constantly. This can affect the results between runs. I would recommend a larger count= that results in a test run of 30-60 seconds at least. Additionally, try other zfs raid types such as mirror and stripe to see if raidz is acting as an unexpectedly large bottleneck, I've found its serial write speed usually leaves something to be desired. Even if the other raid levels won't work realistically in the long run, its useful to raise the bar to find out what extra performance your IO setup can push. It could be useful to compare with gstripe and graid3 for further hardware performance evaluation. On the other hand, if you can read/write data faster than your network connection can push, you're probably at a workable level. Also, I believe that zfs uses a cluster size up to 128k (queueing multiple writes if it can, depending on the disk subsystem) so I think the computer has to do extra work if you are giving it bs=2048k since zfs will have to cut that into 16 pieces, sending one piece to each drive. You might try bs=512k or bs=128k for example to see if this has a positive effect. In a traditional raid5 setup, I've found I get head over heals the best performance when my bs= is the same size as the raid stripe size multiplied by the number of drives, and this gets weird when you have an odd number of drives because your optimum write size might be something like 768k which probably no application is going to produce :) Also it makes it hard to optimize UFS for a larger stripe size when the cluster sizes are generally limited to 16k such as in Solaris. Results tend to fluctuate a bit. offsitetmp# dd if=/dev/zero of=/tank1/test bs=2048k count=1000 1000+0 records in 1000+0 records out 2097152000 bytes transferred in 10.016818 secs (209363092 bytes/sec) offsitetmp# offsitetmp# dd if=/dev/zero of=/tank1/test bs=2048k count=1000 1000+0 records in 1000+0 records out 2097152000 bytes transferred in 10.733547 secs (195382943 bytes/sec) offsitetmp# Drives are raidz ad1: 1430799MB at ata3-master SATA300 ad2: 1430799MB at ata4-master SATA300 ad3: 1430799MB at ata5-master SATA300 ad4: 1430799MB at ata6-master SATA300 on ich9 ---Mike -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"