From owner-freebsd-fs@FreeBSD.ORG  Thu Jul  4 01:08:31 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id E1B59EDB;
 Thu,  4 Jul 2013 01:08:31 +0000 (UTC) (envelope-from jdc@koitsu.org)
Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net
 [217.70.183.197])
 by mx1.freebsd.org (Postfix) with ESMTP id 557E411B5;
 Thu,  4 Jul 2013 01:08:31 +0000 (UTC)
Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139])
 by relay5-d.mail.gandi.net (Postfix) with ESMTP id 1BC4E41C05C;
 Thu,  4 Jul 2013 03:08:20 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net
Received: from relay5-d.mail.gandi.net ([217.70.183.197])
 by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new,
 port 10024)
 with ESMTP id vfFXh18febxI; Thu,  4 Jul 2013 03:08:18 +0200 (CEST)
X-Originating-IP: 76.102.14.35
Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net
 [76.102.14.35]) (Authenticated sender: jdc@koitsu.org)
 by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 5361641C07D;
 Thu,  4 Jul 2013 03:08:17 +0200 (CEST)
Received: by icarus.home.lan (Postfix, from userid 1000)
 id 5DF7B73A1C; Wed,  3 Jul 2013 18:08:15 -0700 (PDT)
Date: Wed, 3 Jul 2013 18:08:15 -0700
From: Jeremy Chadwick <jdc@koitsu.org>
To: Berend de Boer <berend@pobox.com>
Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze?
Message-ID: <20130704010815.GB75529@icarus.home.lan>
References: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com>
 <CADBaqmihCB5JP01hLwXTWHoZiJJ5-jkT-Ro=oDwOcKZT_zvEKA@mail.gmail.com>
 <A5A66641-5EF9-454E-A767-009480EE404E@dragondata.com>
 <871u7g57rl.wl%berend@pobox.com>
 <op.wznad7th34t2sn@tech304.office.supranet.net>
 <87mwq34emp.wl%berend@pobox.com>
 <20130703200241.GB60515@in-addr.com>
 <87k3l748gb.wl%berend@pobox.com>
 <20130703233631.GA74698@icarus.home.lan>
 <87d2qz42q4.wl%berend@pobox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87d2qz42q4.wl%berend@pobox.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Jul 2013 01:08:32 -0000

On Thu, Jul 04, 2013 at 11:56:35AM +1200, Berend de Boer wrote:
> >>>>> "Jeremy" == Jeremy Chadwick <jdc@koitsu.org> writes:
> 
>     Jeremy> As politely as I can: It sounds like you may have spent
>     Jeremy> too much time with these types of setups, or believe them
>     Jeremy> to be "magical" in some way, in turn forgetting the
>     Jeremy> realities of bare metal and instead thinking "everything
>     Jeremy> is software".  Bzzt.
> 
> Heh. The solution with Amazon is even worse: if things go wrong,
> you're screwed. Can't get your disks back. You can't call
> anyone. There's no bare metal to touch, and no, they won't let you
> into their data centres.
> 
> So I'm actually trying to avoid the magic.
>
> The only guarantee I basically have is that if I have made an EBS
> snapshot of my disk, I can, one day, restore that, and that this
> snapshot is stored in some multi-redundancy (magic!) cloud.
> 
> (And obviously you can try to run a mirror in another data centre
> using zfs send/recv, yes, will run that too).
> 
> If you go with AWS, there are no phone calls to make. Disk gone is
> disk gone. So you need to have working backup strategies in place.

How is being reliant on EBS (for readers: Amazon Elastic Block Store,
which is advertised as, and I quote, "a virtualised storage service")
"avoiding the magic"?  You're still reliant on black-box voodoo.  :-)

I think the limiting factor here is more related to your need to use AWS
and its services than using bare metal.  I respect that/understand that,
and won't get into a debate about that.  So, that said...

As I see them, your choices are these:

- Keep using EBS, doing all of this at a "higher level" (meaning the
  Amazon level) by making snapshots of the actual "storage disks"
  that are referenced/used by the underlying OS -- FreeBSD, as we have
  stated, does not have a way to do this (AFAIK) from within the OS
  (meaning "induce an EBS snapshot").  Linux may have that, but no
  matter what, it's an Amazon proprietary thing.  Are we clear?

  You should still be able to use whatever Amazon's EBS or AWS provides
  (as a user interface) to make "snapshots" of those disks, at least
  that's what I'd assume.  I have no familiarity with this, etc..

- Within the OS: raw disk dump.  It doesn't matter what the "backing
  store" is (e.g. EBS, something across iSCSI, etc.).  Example command:
  dd if=/dev/da0 of=/some/other/place bs=64k  (or you can send it to
  stdout and pipe that across ssh, netcat, etc.)

  This will read every LBA on the device -- including unused/untouched
  space, the partitioning scheme/layout (i.e. MBR/GPT), and the boot
  blocks/bootstrap mechanisms -- and will be the size of the disk
  itself (e.g. if 1TB, then the resulting file will be 1TB).

  From what you've said, this does not work for you because of the
  immense size (even if piped through gzip) does not allow for incremental
  snapshots of changes to the disk, and takes a long time.

  There is no way on FreeBSD or Linux, to my knowledge, to accomplish the
  latter at the disk level -- at the filesystem level yes, disk level
  no.  Most people prefer to do this at the filesystem level (which if
  done right is also very fast -- you know this already though).

- Within the OS: UFS+SU (do not use journalling/SUJ with this feature,
  it's known to be busted/throws a nastygram) filesystem snapshots.

  Commonly accomplished using dump(8), and restore using restore(8).
  dump(8) accomplishes snapshot generation by calling mksnap_ffs(8)
  (also a utility).

  Snapshot generation is usually very fast (commonly a few seconds),
  but depends on lots of things which I will not get into.

  dump(8) and restore(8) both support incremental snapshots, and also
  convenient, restore(8) has an interactive mode where you can navigate
  a snapshot and extract individual files.

  These are filesystem snapshots, not disk snapshots, and thus do
  not include things like the partition table (MBR/GPT) nor the
  bootstraps.  This matters more if you're trying to do a "bare metal
  restore" of a box (i.e. box #0 broke badly, need to turn box #1 into
  the role of box #0 in every way/shape/form), so an admin in that
  case has to recreate the partition table and reinstall bootstraps
  manually.  (There are ways to back these up as well via dd, but I
  am not going to go into that).

  And now some real-world experience: what isn't mentioned/discussed
  aside from mailing lists and what not is that this methodology is
  unreliable (for example I have avoided it and been a critic of it for
  several years).  There are problems with the UFS-specific snapshot
  "stuff" that have existed for years, where sometimes the snapshot
  generation never ends, sometimes it causes the system to lock up, and
  lots of other problems).  I will not provide all the details -- just
  go looking at the mailing lists -stable and -fs over the past several
  years and you'll see what I'm talking about.

  Likewise real-world experience: these bugs are what drove me away
  from using UFS snapshots, and I often boycott them for this reason.

- Within the OS: ZFS and use "zfs snapshot".

  These are, of course, ZFS filesystem snapshots.  Incrementals are
  supported, and these are also usually very fast (few seconds).  You
  can also use "zfs {send,recv}" to send/receive the snapshots to
  another system and have them restored on that system (many admins
  really REALLY like this feature).

  Likewise, because this is filesystem-based, again this does not back
  up the partition tables nor the bootstraps.

  There are some "gotchas" with ZFS snapshots but those really depend
  on 1) how you're using them, and 2) your type of data.  I won't go
  into #2, but others here have already mentioned it.

  For example, one bug that's been around for 3 years now has been if
  you prefer to navigate the snapshots as a filesystem and use the
  default filesystem attribute visibility=hidden (default) -- in this
  case "pwd" will return "No such file or directory" when within a
  snapshot.  There are workarounds for this.

  Occasionally I see problems reported by people when using "zfs
  {send,recv}" and on (more rare) occasion issues with snapshot
  generation entirely.  Most of the problems with the latter, however,
  have been worked out within stable/9 (so if you go the ZFS route,
  PLEASE PLEASE PLEASE run stable/9, not 9.1-RELEASE or earlier).

  There are also scripts in ports/sysutils to make management of ZFS
  snapshots much easier.  Some write their own, others use those
  scripts.

  Also, because nobody seems to warn others of this: if you go the
  ZFS route on FreeBSD, please do not use features like dedup or
  compression.  I can expand more on this if asked, as they have
  separate (and in one case identical/similar) caveats.  (I'm always
  willing to bend on compression as long as the user knows of the one
  problem that still exists today and feels it's okay/acceptable)

- Within the OS: rsync and/or rsnapshot (which uses rsync).

  These work at the file level (not filesystem, but file) -- think
  "copying all the files".  They are known to be reliable, and can
  be used in conjunction with systems over a network (to back up from
  system X to system Y; default is via SSH).

  Naturally, this doesn't back up partition tables or bootstraps either.

  rsnapshot provides it's "snapshot-like" behaviour using hard links,
  which allows for incrementals in how it works (read about it on the
  web for further details -- not rocket science).  But be aware
  "incremental" means "files that have been changed, added, or deleted",
  it doesn't mean "store/back up only the portions of a file that changed".
  I.e. if your MySQL table that's 2GB had a write done do it between the
  last snapshot and now, the incremental is going to back up an entire 2GB.
  That may be a drawback depending on what you're doing -- this is for
  your sysadmin to figure out.

  I have read of some problems relating to rsync when used with ZFS, but
  that seems to stem more from the amount of I/O being done and the type
  of data being used on the ZFS pool/filesystem, so rsync just happens
  to tickle something odd in those cases.  I have never personally
  encountered this however (that's just me though), as I explain here:

  Real-world experience: rsnapshot is what I used for my hosting
  organisation of nearly 18 years to back-up 8 or 9 servers, nightly,
  across a network (gigE LAN).  Those servers also used ZFS as their
  filesystems (for everything except root/var/tmp/usr), both the source
  being copied as well as the filesystem used to store the backups,
  and I only once had an issue during the early FreeBSD 8.x days (caused
  by a ZFS bug that has since been fixed).  I still use rsync/rsnapshot
  to this day, even on my local system (which is ZFS-based barring
  root/var/tmp/usr -- I choose to use rsnapshot rather than ZFS
  snapshots for reasons I will not go into here as they're irrelevant).

  However, I would not use this method where ""snapshots"" need to be
  done very regularly (i.e. every hour), particularly on filesystems
  where there are either a) lots and lots of files, or b) files of
  immense size that change often.  Filesystem snapshots are a better
  choice in that case.

There are certainly other options available which I have not touched on,
but in general the filesystem snapshot choice is probably your best
bet.

Filesystem snapshots have one other advantage that you might not have
thought of: they're done within the OS, which means if Amazon's EBS
stuff changes in such a way where you lose backwards compatibility or
encounter bugs with it (during EBS snapshot generation), you can still
get access to your data in some manner of speaking.

I hope this has given you some details, avenues of choice, or at least
things to ponder.  Choose wisely, and remember: **ALWAYS DO A RESTORE
TEST** when choosing a new backup strategy.  I cannot tell you how many
times I encounter people "doing backups" who never test a restore until
that horrible day... only to find their backups were done wrong, or that
the restore process (or even software!) is just utterly broken.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |