From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 01:08:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E1B59EDB; Thu, 4 Jul 2013 01:08:31 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 557E411B5; Thu, 4 Jul 2013 01:08:31 +0000 (UTC) Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 1BC4E41C05C; Thu, 4 Jul 2013 03:08:20 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id vfFXh18febxI; Thu, 4 Jul 2013 03:08:18 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 5361641C07D; Thu, 4 Jul 2013 03:08:17 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 5DF7B73A1C; Wed, 3 Jul 2013 18:08:15 -0700 (PDT) Date: Wed, 3 Jul 2013 18:08:15 -0700 From: Jeremy Chadwick To: Berend de Boer Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? Message-ID: <20130704010815.GB75529@icarus.home.lan> References: <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <871u7g57rl.wl%berend@pobox.com> <87mwq34emp.wl%berend@pobox.com> <20130703200241.GB60515@in-addr.com> <87k3l748gb.wl%berend@pobox.com> <20130703233631.GA74698@icarus.home.lan> <87d2qz42q4.wl%berend@pobox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87d2qz42q4.wl%berend@pobox.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 01:08:32 -0000 On Thu, Jul 04, 2013 at 11:56:35AM +1200, Berend de Boer wrote: > >>>>> "Jeremy" == Jeremy Chadwick writes: > > Jeremy> As politely as I can: It sounds like you may have spent > Jeremy> too much time with these types of setups, or believe them > Jeremy> to be "magical" in some way, in turn forgetting the > Jeremy> realities of bare metal and instead thinking "everything > Jeremy> is software". Bzzt. > > Heh. The solution with Amazon is even worse: if things go wrong, > you're screwed. Can't get your disks back. You can't call > anyone. There's no bare metal to touch, and no, they won't let you > into their data centres. > > So I'm actually trying to avoid the magic. > > The only guarantee I basically have is that if I have made an EBS > snapshot of my disk, I can, one day, restore that, and that this > snapshot is stored in some multi-redundancy (magic!) cloud. > > (And obviously you can try to run a mirror in another data centre > using zfs send/recv, yes, will run that too). > > If you go with AWS, there are no phone calls to make. Disk gone is > disk gone. So you need to have working backup strategies in place. How is being reliant on EBS (for readers: Amazon Elastic Block Store, which is advertised as, and I quote, "a virtualised storage service") "avoiding the magic"? You're still reliant on black-box voodoo. :-) I think the limiting factor here is more related to your need to use AWS and its services than using bare metal. I respect that/understand that, and won't get into a debate about that. So, that said... As I see them, your choices are these: - Keep using EBS, doing all of this at a "higher level" (meaning the Amazon level) by making snapshots of the actual "storage disks" that are referenced/used by the underlying OS -- FreeBSD, as we have stated, does not have a way to do this (AFAIK) from within the OS (meaning "induce an EBS snapshot"). Linux may have that, but no matter what, it's an Amazon proprietary thing. Are we clear? You should still be able to use whatever Amazon's EBS or AWS provides (as a user interface) to make "snapshots" of those disks, at least that's what I'd assume. I have no familiarity with this, etc.. - Within the OS: raw disk dump. It doesn't matter what the "backing store" is (e.g. EBS, something across iSCSI, etc.). Example command: dd if=/dev/da0 of=/some/other/place bs=64k (or you can send it to stdout and pipe that across ssh, netcat, etc.) This will read every LBA on the device -- including unused/untouched space, the partitioning scheme/layout (i.e. MBR/GPT), and the boot blocks/bootstrap mechanisms -- and will be the size of the disk itself (e.g. if 1TB, then the resulting file will be 1TB). From what you've said, this does not work for you because of the immense size (even if piped through gzip) does not allow for incremental snapshots of changes to the disk, and takes a long time. There is no way on FreeBSD or Linux, to my knowledge, to accomplish the latter at the disk level -- at the filesystem level yes, disk level no. Most people prefer to do this at the filesystem level (which if done right is also very fast -- you know this already though). - Within the OS: UFS+SU (do not use journalling/SUJ with this feature, it's known to be busted/throws a nastygram) filesystem snapshots. Commonly accomplished using dump(8), and restore using restore(8). dump(8) accomplishes snapshot generation by calling mksnap_ffs(8) (also a utility). Snapshot generation is usually very fast (commonly a few seconds), but depends on lots of things which I will not get into. dump(8) and restore(8) both support incremental snapshots, and also convenient, restore(8) has an interactive mode where you can navigate a snapshot and extract individual files. These are filesystem snapshots, not disk snapshots, and thus do not include things like the partition table (MBR/GPT) nor the bootstraps. This matters more if you're trying to do a "bare metal restore" of a box (i.e. box #0 broke badly, need to turn box #1 into the role of box #0 in every way/shape/form), so an admin in that case has to recreate the partition table and reinstall bootstraps manually. (There are ways to back these up as well via dd, but I am not going to go into that). And now some real-world experience: what isn't mentioned/discussed aside from mailing lists and what not is that this methodology is unreliable (for example I have avoided it and been a critic of it for several years). There are problems with the UFS-specific snapshot "stuff" that have existed for years, where sometimes the snapshot generation never ends, sometimes it causes the system to lock up, and lots of other problems). I will not provide all the details -- just go looking at the mailing lists -stable and -fs over the past several years and you'll see what I'm talking about. Likewise real-world experience: these bugs are what drove me away from using UFS snapshots, and I often boycott them for this reason. - Within the OS: ZFS and use "zfs snapshot". These are, of course, ZFS filesystem snapshots. Incrementals are supported, and these are also usually very fast (few seconds). You can also use "zfs {send,recv}" to send/receive the snapshots to another system and have them restored on that system (many admins really REALLY like this feature). Likewise, because this is filesystem-based, again this does not back up the partition tables nor the bootstraps. There are some "gotchas" with ZFS snapshots but those really depend on 1) how you're using them, and 2) your type of data. I won't go into #2, but others here have already mentioned it. For example, one bug that's been around for 3 years now has been if you prefer to navigate the snapshots as a filesystem and use the default filesystem attribute visibility=hidden (default) -- in this case "pwd" will return "No such file or directory" when within a snapshot. There are workarounds for this. Occasionally I see problems reported by people when using "zfs {send,recv}" and on (more rare) occasion issues with snapshot generation entirely. Most of the problems with the latter, however, have been worked out within stable/9 (so if you go the ZFS route, PLEASE PLEASE PLEASE run stable/9, not 9.1-RELEASE or earlier). There are also scripts in ports/sysutils to make management of ZFS snapshots much easier. Some write their own, others use those scripts. Also, because nobody seems to warn others of this: if you go the ZFS route on FreeBSD, please do not use features like dedup or compression. I can expand more on this if asked, as they have separate (and in one case identical/similar) caveats. (I'm always willing to bend on compression as long as the user knows of the one problem that still exists today and feels it's okay/acceptable) - Within the OS: rsync and/or rsnapshot (which uses rsync). These work at the file level (not filesystem, but file) -- think "copying all the files". They are known to be reliable, and can be used in conjunction with systems over a network (to back up from system X to system Y; default is via SSH). Naturally, this doesn't back up partition tables or bootstraps either. rsnapshot provides it's "snapshot-like" behaviour using hard links, which allows for incrementals in how it works (read about it on the web for further details -- not rocket science). But be aware "incremental" means "files that have been changed, added, or deleted", it doesn't mean "store/back up only the portions of a file that changed". I.e. if your MySQL table that's 2GB had a write done do it between the last snapshot and now, the incremental is going to back up an entire 2GB. That may be a drawback depending on what you're doing -- this is for your sysadmin to figure out. I have read of some problems relating to rsync when used with ZFS, but that seems to stem more from the amount of I/O being done and the type of data being used on the ZFS pool/filesystem, so rsync just happens to tickle something odd in those cases. I have never personally encountered this however (that's just me though), as I explain here: Real-world experience: rsnapshot is what I used for my hosting organisation of nearly 18 years to back-up 8 or 9 servers, nightly, across a network (gigE LAN). Those servers also used ZFS as their filesystems (for everything except root/var/tmp/usr), both the source being copied as well as the filesystem used to store the backups, and I only once had an issue during the early FreeBSD 8.x days (caused by a ZFS bug that has since been fixed). I still use rsync/rsnapshot to this day, even on my local system (which is ZFS-based barring root/var/tmp/usr -- I choose to use rsnapshot rather than ZFS snapshots for reasons I will not go into here as they're irrelevant). However, I would not use this method where ""snapshots"" need to be done very regularly (i.e. every hour), particularly on filesystems where there are either a) lots and lots of files, or b) files of immense size that change often. Filesystem snapshots are a better choice in that case. There are certainly other options available which I have not touched on, but in general the filesystem snapshot choice is probably your best bet. Filesystem snapshots have one other advantage that you might not have thought of: they're done within the OS, which means if Amazon's EBS stuff changes in such a way where you lose backwards compatibility or encounter bugs with it (during EBS snapshot generation), you can still get access to your data in some manner of speaking. I hope this has given you some details, avenues of choice, or at least things to ponder. Choose wisely, and remember: **ALWAYS DO A RESTORE TEST** when choosing a new backup strategy. I cannot tell you how many times I encounter people "doing backups" who never test a restore until that horrible day... only to find their backups were done wrong, or that the restore process (or even software!) is just utterly broken. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |