From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 08:59:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 579A84C7 for ; Wed, 3 Jul 2013 08:59:52 +0000 (UTC) (envelope-from markus.gebert@hostpoint.ch) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) by mx1.freebsd.org (Postfix) with ESMTP id F079D1727 for ; Wed, 3 Jul 2013 08:59:51 +0000 (UTC) Received: from [2001:1620:2013:1:bdf8:1930:3dc:492a] (port=63705) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UuIuj-000GmZ-DD; Wed, 03 Jul 2013 10:59:49 +0200 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze? From: Markus Gebert In-Reply-To: Date: Wed, 3 Jul 2013 10:59:07 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch> References: <87li5o5tz2.wl%berend@pobox.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> To: Kevin Day X-Mailer: Apple Mail (2.1508) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 08:59:52 -0000 On 03.07.2013, at 09:02, Kevin Day wrote: >=20 > On Jul 3, 2013, at 1:53 AM, Will Andrews wrote: >=20 >> On Wednesday, July 3, 2013, Kevin Day wrote: >> The closest thing we can do in FreeBSD is to unmount the filesystem, = take the snapshot, and remount. This has the side effect of closing all = open files, so it's not really an alternative. >>=20 >> The other option is to not freeze the filesystem before taking the = snapshot, but again you risk leaving things in an inconsistent state, = and/or the last few writes you think you made didn't actually get = committed to disk yet. For automated systems that create then clone = filesystems for new VMs, this can be a big problem. At best, you're = going to get a warning that the filesystem wasn't cleanly unmounted. >>=20 >> Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause = I/O running in other contexts, but it does guarantee that any commands = you ran and completed prior to calling sync will make it to disk in ZFS. >>=20 >> This is because sync in ZFS is implemented as a ZIL commit, so = transactions that haven't yet made it to disk via the normal syncing = context will at least be committed via their ZIL blocks. Which can then = be replayed when the pool is imported later, in this case from the EBS = snapshots. >>=20 >> And since the entire tree from the =FCberblock down in ZFS is COW, = you can't get an inconsistent pool simply by doing a virtual disk = snapshot, regardless of how that is implemented. >>=20 >> --Will. >=20 > Sorry, yes, this is true. We're not using ZFS to clone and provision = new VMs, so I was just thinking about UFS here. And ZFS does have a good = advantage here that it seems to actually respect sync requests. I think = it was here I reported a few months ago that we were seeing UFS+SUJ not = actually doing anything when sync(8) was called. >=20 > But for some workloads this still isn't sufficient if you have = processes running that could be writing at any time. As an example, we = have a database server using ZFS backed storage. Short of shutting down = the server, there's no way to guarantee it won't try to write even if we = lock all tables, disconnect all clients, etc. mysql has all sorts of = things done on timers that occur lazily in the future, including = periodic checkpoint writes even if there is no activity. >=20 > I know this is a sort of obscure use case, but Linux and Windows both = have this functionality that VMWare will use if present (and the guest = tools know about it). Linux goes a step further and ensures that it's = not in the middle of writing anything to swap during the quiesce period, = too. I don't think this would be terribly difficult to implement, a hook = somewhere along the write chain that blocks (or queues up) anything = trying to write until the unfreeze comes along, but I'm guessing there = are all sorts of deadlock opportunities here. Indeed sync(8) has the disadvantage that you cannot prevent writes = between the syscall and the EBS snapshot, so depending on the = application, this can make the resulting EBS snapshot useless. But taking a zfs snapshot is an atomic operation. Why not use that? For = example: 1. snapshot the zfs at the same point in time you'd issue that ioctl on = Linux 2. take the EBS snapshot at any time 3. clone the EBS snapshot to the new/other VM 4. zfs import the pool there 5. zfs rollback the filesystem to the snapshot taken in step 1 (or clone = it and use that) Any writes that have been issued between the zfs snapshot and the EBS = snapshot are discarded, and like that you get the exact same filesystem = data as you would have gotten with ioctl. Also, taking the zfs snapshot = should take much less time, because you don't have to wait for the EBS = snapshot to complete before you can resume IO on the filesystem. So you = don't even depend on EBS snapshots being quick when using the zfs = approach, a big advantage in my opinion. Markus