From owner-freebsd-fs@FreeBSD.ORG  Tue Jun  8 15:20:07 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CF1BD1065674
	for <freebsd-fs@freebsd.org>; Tue,  8 Jun 2010 15:20:07 +0000 (UTC)
	(envelope-from brad@duttonbros.com)
Received: from uno.mnl.com (uno.mnl.com [64.221.209.136])
	by mx1.freebsd.org (Postfix) with ESMTP id A05CA8FC15
	for <freebsd-fs@freebsd.org>; Tue,  8 Jun 2010 15:20:07 +0000 (UTC)
Received: from uno.mnl.com (localhost [127.0.0.1])
	by uno.mnl.com (Postfix) with ESMTP id 1DD501F04;
	Tue,  8 Jun 2010 08:20:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=duttonbros.com; h=
	message-id:date:from:to:cc:subject:references:in-reply-to
	:mime-version:content-type:content-transfer-encoding; s=mail;
	bh=UK/HH0x79TxQQwm/y3vO9uNunDs=; b=YR1uMMY3wzuG9aItq9LtESzI7f5G
	aQJ8oENrDeY1flS9yb2t/XKVJEqanll4GGX3D/L2sLmHEZKdfxe1Rhfac1x12Eo/
	P7NVimFDL/pf90i1e/MFaQP+lHdR6IDU7xBnr/xqlgYjHqsolqtBr1c57qkBkIPV
	NNU0VIPSJyLiM3I=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=duttonbros.com; h=message-id
	:date:from:to:cc:subject:references:in-reply-to:mime-version
	:content-type:content-transfer-encoding; q=dns; s=mail; b=V6d4Ac
	LlJtYifoPgABSStCMcijEGy40Pe5QJiol24gfej3IVSSUUH1otIsWjik520jTlcB
	sk1smdsYvBXcFqlwdpJvAwZjD4BPnjLIvd71S1zNCpoJvY0hBs5AKzOGDDvPkknd
	zUgtN1+pWvv/1dVtSxN3GVoZhtMW5ruff46Ro=
Received: from localhost (localhost [127.0.0.1])
	by uno.mnl.com (Postfix) with ESMTP id 0739E1F03;
	Tue,  8 Jun 2010 08:20:07 -0700 (PDT)
Received: from c-98-210-178-102.hsd1.ca.comcast.net
	(c-98-210-178-102.hsd1.ca.comcast.net [98.210.178.102]) by
	duttonbros.com
	(Horde Framework) with HTTP; Tue, 08 Jun 2010 08:20:06 -0700
Message-ID: <20100608082006.5006764hokcpvzqe@duttonbros.com>
Date: Tue, 08 Jun 2010 08:20:06 -0700
From: "Bradley W. Dutton" <brad@duttonbros.com>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <20100607154256.941428ovaq2hha0g@duttonbros.com>
	<alpine.GSO.2.01.1006071811040.12887@freddy.simplesystems.org>
	<20100607173218.11716iopp083dbpu@duttonbros.com>
	<20100608044707.GA78147@icarus.home.lan>
In-Reply-To: <20100608044707.GA78147@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=ISO-8859-1;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Internet Messaging Program (IMP) H3 (4.3.7) / FreeBSD-8.1
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS performance of various vdevs (long post)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jun 2010 15:20:07 -0000

Quoting Jeremy Chadwick <freebsd@jdc.parodius.com>:

>> On Mon, Jun 07, 2010 at 05:32:18PM -0700, Bradley W. Dutton wrote:
>> I know it's pretty simple but for checking throughput I thought it
>> would be ok. I don't have compression on and based on the drive
>> lights and gstat, the drives definitely aren't idle.
>
> Try disabling prefetch (you have it enabled) and try setting
> vfs.zfs.txg.timeout="5".  Some people have reported a "sweet spot" with
> regards to the last parameter (needing to be adjusted if your disks are
> extremely fast, etc.), as otherwise ZFS would be extremely "bursty" in
> its I/O (stalling/deadlocking the system at set intervals).  By
> decreasing the value you essentially do disk writes more regularly (with
> less data), and depending upon the load and controller, this may even
> out performance.

I tested some of these settings. With the timeout set to 5 not much  
changed write wise. (keep in mind these results are the Nvidia/WDRE2  
combo):

With txg=5 and prefetch disabled I saw read speeds go down considerably:
# normal/jbod txg=5 no prefetch
zpool create bench /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad12 /dev/ad14
dd if=/bench/test.file of=/dev/null bs=1m
12582912000 bytes transferred in 59.330330 secs (212082286 bytes/sec)
compared to
12582912000 bytes transferred in 34.668165 secs (362952928 bytes/sec)

zpool create bench raidz /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad12 /dev/ad14
dd if=/bench/test.file of=/dev/null bs=1m
12582912000 bytes transferred in 71.135696 secs (176886046 bytes/sec)
compared to
12582912000 bytes transferred in 45.825533 secs (274582993 bytes/sec)

Running the same tests on the raidz2 Supermicro/Hitachi setup didn't  
yield any difference in writes, the reads were slower:
zpool create tank raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4  
/dev/da5 /dev/da6 /dev/da7
dd if=/tank/test.file of=/dev/null bs=1m
12582912000 bytes transferred in 44.118409 secs (285207745 bytes/sec)
compared to
12582912000 bytes transferred in 32.911291 secs (382328118 bytes/sec)

I rebooted and reran these numbers just to make sure they were consistent.


>> >The higher CPU usage might be due to the device driver or the
>> >interface card being used.
>>
>> Definitely a plausible explanation. If this was the case would the 8
>> parallel dd processes exhibit the same behavior? or is the type of
>> IO affecting how much CPU the driver is using?
>
> It would be the latter.
>
> Also, I believe this Supermicro controller has been discussed in the
> past.  I can't remember if people had outright failures/issues with it
> or if people were complaining about sub-par performance.  I could also
> be remembering a different Supermicro controller.
>
> If I had to make a recommendation, it would be to reproduce the same
> setup on a system using an Intel ICH9/ICH9R or ICH10/ICH10R controller
> in AHCI mode (with ahci.ko loaded, not ataahci.ko) and see if things
> improve.  But start with the loader.conf tunables I mentioned above --
> segregate each test.
>
> I would also recommend you re-run your tests with a different blocksize
> for dd.  I don't know why people keep using 1m (Linux websites?).  Test
> the following increments: 4k, 8k, 16k, 32k, 64k, 128k, 256k.  That's
> about where you should stop.

I tested with 8, 16, 32, 64, 128, 1m and the results all looked  
similar. As such I stuck with bs=1m because it's easier to change count.


> Otherwise, consider installing ports/benchmarks/bonnie++ and try that.
> That will also get you concurrent I/O tests, I believe.

I may give this a shot but I'm most interested in less concurrency as  
I have larger files with only a couple of readers/writers. As Bob  
noted a bunch of mirrors in the pool would definitely be faster for  
concurrent IO.


Thanks for the help,
Brad