From owner-freebsd-fs@FreeBSD.ORG Mon Sep 19 20:37:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C8F81065673 for ; Mon, 19 Sep 2011 20:37:17 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 0808C8FC13 for ; Mon, 19 Sep 2011 20:37:16 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC43190.dip.t-dialin.net [79.196.49.144]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 6CDB9844017; Mon, 19 Sep 2011 22:36:56 +0200 (CEST) Received: from unknown (IO.Leidinger.net [192.168.1.12]) by outgoing.leidinger.net (Postfix) with ESMTP id AD9C556C7; Mon, 19 Sep 2011 22:36:53 +0200 (CEST) Date: Mon, 19 Sep 2011 22:36:53 +0200 From: Alexander Leidinger To: Jason Usher Message-ID: <20110919223653.0000702b@unknown> In-Reply-To: <1316458811.88701.YahooMailClassic@web121208.mail.ne1.yahoo.com> References: <72A6ABD6-F6FD-4563-AB3F-6061E3DD9FBF@digsys.bg> <1316458811.88701.YahooMailClassic@web121208.mail.ne1.yahoo.com> X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 6CDB9844017.A0406 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1, required 6, autolearn=disabled, ALL_TRUSTED -1.00) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1317069420.68406@49RmZMn9Z0YXgwBrqlE1Vg X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Sep 2011 20:37:17 -0000 On Mon, 19 Sep 2011 12:00:11 -0700 (PDT) Jason Usher wrote: > --- On Sat, 9/17/11, Daniel Kalchev wrote: > > > There is not single magnetic drive on the market that can > > saturate SATA2 (300 Mbps), yet. Most can't match even SATA1 > > (150 MBps). You don't need that much dedicated bandwidth for > > drives. > > If you intend to have 48/96 SSDs, then that is another > > story, but then I am doubtful a "PC" architecture can handle > > that much data either. > > > Hmmm... I understand this, but is there not any data that might > transfer from multiple magnetic disks, simultaneously, at 6GB, that > could periodically max out the card bandwidth ? As in, all drives in > a 12 drive array perform an operation on their built-in cache > simultaneously ? A pragmatic advise: Do not put all drives into the the same vdev. Have a look at the ZFS best practices guide for some words about how much drives shall be in the same vdev. Concatenate several RAIDZx vdevs instead. Play a little bit around on paper what works best for you. An example: With 8 controllers (assuming 6 ports each) you could do 6 raidz1 vdevs (one drive from each controller in the same raidz1) which are concatenated to give you a pool of ports*num_controller_minus_one*drivesize amount of storage. From each of those 6 vdevs one drive (6 in total = number of raidz1 vdevs) or one controller (number of controllers which can fail = the X in raidzX) can fail. If you need speed, rely on RAM or L2ARC (assuming the data is read often enough to be cached). If you need more speed, go with SSDs instead of harddisks for the pool-drives (a L2ARC does not make much sense then, except you invest in something significant faster like a Fusion-board as already mentioned in the thread). Optimizing for the theoretical case that all drives deliver everything from the HD-cache is a waste of money because you are either in the unlikely case that this really happens (go play in the lottery instead, you may have more luck). If the access pattern is really that strange that it happens often enough for you that such an optimization would give a nice speed increase, invest the money in more RAM to have the data in the ARC instead. > > Memory is much more expensive than SSDs for L2ARC and if > > your workload permits it (lots of repeated small reads), > > larger L2ARC will help a lot. It will also help if you have > > huge spool or if you enable dedup etc. Just populate as much > > RAM as the server can handle and then add L2ARC > > (read-optimized). > > > That's interesting (the part about dedup being assisted by L2ARC) ... > what about snapshots ? If we run 14 or 21 snapshots, what component > is that stressing, and what structures would speed that up ? A snapshot is a short write to the disks. I do not know if it is a sync- or async-write. If you do not do a lot of snapshots per second or minute (I hope the 14/21 values mean to take a snapshot per (working-)hour), I would not worry about this. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137