From owner-freebsd-stable@FreeBSD.ORG  Wed Apr  6 12:55:52 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 39746106566B
	for <freebsd-stable@freebsd.org>; Wed,  6 Apr 2011 12:55:52 +0000 (UTC)
	(envelope-from petefrench@ingresso.co.uk)
Received: from constantine.ingresso.co.uk (constantine.ingresso.co.uk
	[IPv6:2001:470:1f09:176e::3])
	by mx1.freebsd.org (Postfix) with ESMTP id 031668FC0C
	for <freebsd-stable@freebsd.org>; Wed,  6 Apr 2011 12:55:52 +0000 (UTC)
Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6]
	helo=dilbert.ticketswitch.com)
	by constantine.ingresso.co.uk with esmtps (TLSv1:AES256-SHA:256)
	(Exim 4.73 (FreeBSD)) (envelope-from <petefrench@ingresso.co.uk>)
	id 1Q7SH0-000I9z-K0; Wed, 06 Apr 2011 13:55:50 +0100
Received: from petefrench by dilbert.ticketswitch.com with local (Exim 4.74
	(FreeBSD)) (envelope-from <petefrench@ingresso.co.uk>)
	id 1Q7SH0-0004lT-JB; Wed, 06 Apr 2011 13:55:50 +0100
To: daniel@digsys.bg, freebsd-stable@freebsd.org
In-Reply-To: <4D9B2EA2.9020700@digsys.bg>
Message-Id: <E1Q7SH0-0004lT-JB@dilbert.ticketswitch.com>
From: Pete French <petefrench@ingresso.co.uk>
Date: Wed, 06 Apr 2011 13:55:50 +0100
Cc: 
Subject: Re: ZFS HAST config preference
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Apr 2011 12:55:52 -0000

> My original idea was to set up blades so that they run HAST on pairs of 
> disks, and run ZFS in number of mirror vdevs on top of HAST. The ZFS 
> pool will exist only on the master HAST node. Let's call this setup1.

This is exactly how I run things. Personally I think
it is the best solution, as ZFS then knows about the
mirroring of the drives, always a good thing, and you
geta  ZFS filesystem on the top. I also enable compression
to reduce the bandwidth to the drives, as tha reduces the
data flowing across the network. For what I am doing (running
mysql on top) this is actually faster for both selects and
inserts. test with your own application first though.

> Or, I could use ZFS volumes and run HAST on top of these. This means, 
> that on each blade, I will have an local ZFS pool. Let's call this setup2.

...you would need to put a filesystem on to of the
HAST filesystem though, what would that be ?

> While setup1 is most straightforward, it has some drawbacks:
> - disks handled by HAST need to be either identical or have matching 
> partitions created;

This is true. I run identical machines as primary and secondary.

> - the 'spare' blade would do nothing, as it's disk subsystem will be 
> gone as long as it is HAST slave. As the blades are quite powerful (4x8 
> core AMD) that would be wasteful, at least in the beginning.

If you are keeping a machine as a hot spare then thats something
you just have to live with in my opinion. I've run this
way for several years - before HAST and ZFS we used gmirror and UFS
to do the same thing. It does work very nicely, but you do end up
with a machine idle. 

> HAST replication speed should not be an issue, there is 10Gbit network 
> between the blade servers.

I actually ut separate ether interfaces for each of the hast drives.
So for 2 drives there are 2 space ether ports on the machine, with
a cible between them, deidcated to just that drive. Those are gigabit
cards though, not 10 gig.

> Has anyone already setup something similar? What was the experience? 

very good actually. One thing I would say is to write and test a set of
scvripts to do the failover - avoids shooting yourself in the foot
when trying to do the commands by hand (which is rather easy to do).
I have a script to make the orimary into a secondary,
and one to d the reverse. The first scipt waits unitl the HAST data
is all flushed before changing role, and makes sure the services
are stoped, pool exported before ripping out the disc from underneath.
The script also handles removing a shared IP address from the
interface. 

-pete.