From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 19 20:37:17 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7C8F81065673
	for <freebsd-fs@freebsd.org>; Mon, 19 Sep 2011 20:37:17 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 0808C8FC13
	for <freebsd-fs@freebsd.org>; Mon, 19 Sep 2011 20:37:16 +0000 (UTC)
Received: from outgoing.leidinger.net (p4FC43190.dip.t-dialin.net
	[79.196.49.144])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 6CDB9844017;
	Mon, 19 Sep 2011 22:36:56 +0200 (CEST)
Received: from unknown (IO.Leidinger.net [192.168.1.12])
	by outgoing.leidinger.net (Postfix) with ESMTP id AD9C556C7;
	Mon, 19 Sep 2011 22:36:53 +0200 (CEST)
Date: Mon, 19 Sep 2011 22:36:53 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Jason Usher <jusher71@yahoo.com>
Message-ID: <20110919223653.0000702b@unknown>
In-Reply-To: <1316458811.88701.YahooMailClassic@web121208.mail.ne1.yahoo.com>
References: <72A6ABD6-F6FD-4563-AB3F-6061E3DD9FBF@digsys.bg>
	<1316458811.88701.YahooMailClassic@web121208.mail.ne1.yahoo.com>
X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: 6CDB9844017.A0406
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=-1, required 6, autolearn=disabled,
	ALL_TRUSTED -1.00)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1317069420.68406@49RmZMn9Z0YXgwBrqlE1Vg
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths...
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Sep 2011 20:37:17 -0000

On Mon, 19 Sep 2011 12:00:11 -0700 (PDT) Jason Usher
<jusher71@yahoo.com> wrote:

> --- On Sat, 9/17/11, Daniel Kalchev <daniel@digsys.bg> wrote:
> 
> > There is not single magnetic drive on the market that can
> > saturate SATA2 (300 Mbps), yet. Most can't match even SATA1
> > (150 MBps). You don't need that much dedicated bandwidth for
> > drives.
> > If you intend to have 48/96 SSDs, then that is another
> > story, but then I am doubtful a "PC" architecture can handle
> > that much data either.
> 
> 
> Hmmm... I understand this, but is there not any data that might
> transfer from multiple magnetic disks, simultaneously, at 6GB, that
> could periodically max out the card bandwidth ?  As in, all drives in
> a 12 drive array perform an operation on their built-in cache
> simultaneously ?

A pragmatic advise: Do not put all drives into the the same vdev. Have
a look at the ZFS best practices guide for some words about how much
drives shall be in the same vdev. Concatenate several RAIDZx vdevs
instead. Play a little bit around on paper what works best for you.

An example:
With 8 controllers (assuming 6 ports each) you could do 6 raidz1
vdevs (one drive from each controller in the same raidz1) which are
concatenated to give you a pool of
ports*num_controller_minus_one*drivesize amount of storage. From each of
those 6 vdevs one drive (6 in total = number of raidz1 vdevs) or
one controller (number of controllers which can fail = the X in raidzX)
can fail. 

If you need speed, rely on RAM or L2ARC (assuming the data is read
often enough to be cached). If you need more speed, go with SSDs
instead of harddisks for the pool-drives (a L2ARC does not make
much sense then, except you invest in something significant faster
like a Fusion-board as already mentioned in the thread). 

Optimizing for the theoretical case that all drives deliver everything
from the HD-cache is a waste of money because you are either in the
unlikely case that this really happens (go play in the lottery instead,
you may have more luck). If the access pattern is really that strange
that it happens often enough for you that such an optimization would
give a nice speed increase, invest the money in more RAM to have the
data in the ARC instead.

> > Memory is much more expensive than SSDs for L2ARC and if
> > your workload permits it (lots of repeated small reads),
> > larger L2ARC will help a lot. It will also help if you have
> > huge spool or if you enable dedup etc. Just populate as much
> > RAM as the server can handle and then add L2ARC
> > (read-optimized).
> 
> 
> That's interesting (the part about dedup being assisted by L2ARC) ...
> what about snapshots ?  If we run 14 or 21 snapshots, what component
> is that stressing, and what structures would speed that up ?

A snapshot is a short write to the disks. I do not know if it is a sync-
or async-write. If you do not do a lot of snapshots per second or
minute (I hope the 14/21 values mean to take a snapshot per
(working-)hour), I would not worry about this.

Bye,
Alexander.

-- 
http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137