From owner-freebsd-stable@FreeBSD.ORG  Tue Jul 15 14:54:26 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E12A51065670
	for <freebsd-stable@freebsd.org>; Tue, 15 Jul 2008 14:54:26 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3])
	by mx1.freebsd.org (Postfix) with ESMTP id B51748FC0C
	for <freebsd-stable@freebsd.org>; Tue, 15 Jul 2008 14:54:26 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: by mx01.sc1.parodius.com (Postfix, from userid 1000)
	id ADEEE1CC09A; Tue, 15 Jul 2008 07:54:26 -0700 (PDT)
Date: Tue, 15 Jul 2008 07:54:26 -0700
From: Jeremy Chadwick <koitsu@FreeBSD.org>
To: Sven Willenberger <sven@dmv.com>
Message-ID: <20080715145426.GA31340@eos.sc1.parodius.com>
References: <1216130834.27608.27.camel@lanshark.dmv.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1216130834.27608.27.camel@lanshark.dmv.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Cc: freebsd-stable@freebsd.org
Subject: Re: Multi-machine mirroring choices
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jul 2008 14:54:27 -0000

On Tue, Jul 15, 2008 at 10:07:14AM -0400, Sven Willenberger wrote:
> 3) The send/recv feature of zfs was something I had not even considered
> until very recently. My understanding is that this would work by a)
> taking a snapshot of master_data1 b) zfs sending that snapshot to
> slave_data1 c) via ssh on pipe, receiving that snapshot on slave_data1
> and then d) doing incremental snapshots, sending, receiving as in
> (a)(b)(c). How time/cpu intensive is the snapshot generation and just
> how granular could this be done? I would imagine for systems with litle
> traffic/changes this could be practical but what about systems that may
> see a lot of files added, modified, deleted to the filesystem(s)?

I can speak a bit on ZFS snapshots, because I've used them in the past
with good results.

Compared to UFS2 snapshots (e.g. dump -L or mksnap_ffs), ZFS snapshots
are fantastic.  The two main positives for me were:

1) ZFS snapshots take significantly less time to create; I'm talking
seconds or minutes vs. 30-45 minutes.  I also remember receiving mail
from someone (on -hackers?  I can't remember -- let me know and I can
dig through my mail archives for the specific mail/details) stating
something along the lines of "over time, yes, UFS2 snapshots take
longer and longer, it's a known design problem".

2) ZFS snapshots, when created, do not cause the system to more or less
deadlock until the snapshot is generated; you can continue to use the
system during the time the snapshot is being generated.  While with
UFS2, dump -L and mksnap_ffs will surely disappoint you.

We moved all of our production systems off of using dump/restore solely
because of these aspects.  We didn't move to ZFS though; we went with
rsync, which is great, except for the fact that it modifies file atimes
(hope you use Maildir and not classic mbox/mail spools...).

ZFS's send/recv capability (over a network) is something I didn't have
time to experiment with, but it looked *very* promising.  The method is
documented in the manpage as "Example 12", and is very simple -- as it
should be.  You don't have to use SSH either, by the way[1].

One of the "annoyances" to ZFS snapshots, however, was that I had to
write my own script to do snapshot rotations (think incremental dump(8)
but using ZFS snapshots).

> I would be interested to hear anyone's experience with any (or all) of
> these methods and caveats of each. I am leaning towards ggate(dc) +
> zpool at the moment assuming that zfs can "smartly" rebuild the mirror
> after the slave's ggated processes bug out.

I don't have any experience with GEOM gate, so I can't comment on it.
But I would highly recommend you discuss the shortcomings with pjd@,
because he definitely listens.

However, I must ask you this: why are you doing things the way you are?
Why are you using the equivalent of RAID 1 but for entire computers?  Is
there some reason you aren't using a filer (e.g. NetApp) for your data,
thus keeping it centralised?  There has been recent discussion of using
FreeBSD with ZFS as such, over on freebsd-fs.  If you want a link to the
thread, I can point you to it.

I'd like to know why you're doing things the way you are.  By knowing
why, possibly myself or others could recommend solving the problem in a
different way -- one that doesn't involve realtime duplication of
filesystems via network.


[1]: If you're transferring huge sums of data over a secure link (read:
dedicated gigE LAN or a separate VLAN), you'll be disappointed to find
that there is no Cipher=none with stock SSH; the closest you'll get is
blowfish-cbc.  You might be saddened by the fact that the only way
you'll get Cipher=none is via the HPN patches, which means you'll be
forced to install ports/security/openssh-portable.  (I am not a fan of
the "overwrite the base system" concept; it's a hack, and I'd rather get
rid of the whole "base system" concept in general -- but that's for
another discussion).  My point is, your overall network I/O will be
limited by SSH, so if you're pushing lots of data across a LAN, consider
something without encryption.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |