From owner-freebsd-fs@FreeBSD.ORG  Wed Jul  3 08:59:52 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 579A84C7
 for <freebsd-fs@freebsd.org>; Wed,  3 Jul 2013 08:59:52 +0000 (UTC)
 (envelope-from markus.gebert@hostpoint.ch)
Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch
 [IPv6:2a00:d70:0:a::e0])
 by mx1.freebsd.org (Postfix) with ESMTP id F079D1727
 for <freebsd-fs@freebsd.org>; Wed,  3 Jul 2013 08:59:51 +0000 (UTC)
Received: from [2001:1620:2013:1:bdf8:1930:3dc:492a] (port=63705)
 by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128)
 (Exim 4.80.1 (FreeBSD)) (envelope-from <markus.gebert@hostpoint.ch>)
 id 1UuIuj-000GmZ-DD; Wed, 03 Jul 2013 10:59:49 +0200
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze?
From: Markus Gebert <markus.gebert@hostpoint.ch>
In-Reply-To: <A5A66641-5EF9-454E-A767-009480EE404E@dragondata.com>
Date: Wed, 3 Jul 2013 10:59:07 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch>
References: <87li5o5tz2.wl%berend@pobox.com>
 <CA+tpaK1jQuKneQsxkVfxJGzXdPdLZfqBM1QWQ0e19nK5t71t1Q@mail.gmail.com>
 <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan>
 <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com>
 <CADBaqmihCB5JP01hLwXTWHoZiJJ5-jkT-Ro=oDwOcKZT_zvEKA@mail.gmail.com>
 <A5A66641-5EF9-454E-A767-009480EE404E@dragondata.com>
To: Kevin Day <toasty@dragondata.com>
X-Mailer: Apple Mail (2.1508)
Cc: freebsd-fs <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jul 2013 08:59:52 -0000


On 03.07.2013, at 09:02, Kevin Day <toasty@dragondata.com> wrote:

>=20
> On Jul 3, 2013, at 1:53 AM, Will Andrews <will@firepipe.net> wrote:
>=20
>> On Wednesday, July 3, 2013, Kevin Day wrote:
>> The closest thing we can do in FreeBSD is to unmount the filesystem, =
take the snapshot, and remount. This has the side effect of closing all =
open files, so it's not really an alternative.
>>=20
>> The other option is to not freeze the filesystem before taking the =
snapshot, but again you risk leaving things in an inconsistent state, =
and/or the last few writes you think you made didn't actually get =
committed to disk yet. For automated systems that create then clone =
filesystems for new VMs, this can be a big problem. At best, you're =
going to get a warning that the filesystem wasn't cleanly unmounted.
>>=20
>> Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause =
I/O running in other contexts, but it does guarantee that any commands =
you ran and completed prior to calling sync will make it to disk in ZFS.
>>=20
>> This is because sync in ZFS is implemented as a ZIL commit, so =
transactions that haven't yet made it to disk via the normal syncing =
context will at least be committed via their ZIL blocks. Which can then =
be replayed when the pool is imported later, in this case from the EBS =
snapshots.
>>=20
>> And since the entire tree from the =FCberblock down in ZFS is COW, =
you can't get an inconsistent pool simply by doing a virtual disk =
snapshot, regardless of how that is implemented.
>>=20
>> --Will.
>=20
> Sorry, yes, this is true. We're not using ZFS to clone and provision =
new VMs, so I was just thinking about UFS here. And ZFS does have a good =
advantage here that it seems to actually respect sync requests. I think =
it was here I reported a few months ago that we were seeing UFS+SUJ not =
actually doing anything when sync(8) was called.
>=20
> But for some workloads this still isn't sufficient if you have =
processes running that could be writing at any time. As an example, we =
have a database server using ZFS backed storage. Short of shutting down =
the server, there's no way to guarantee it won't try to write even if we =
lock all tables, disconnect all clients, etc. mysql has all sorts of =
things done on timers that occur lazily in the future, including =
periodic checkpoint writes even if there is no activity.
>=20
> I know this is a sort of obscure use case, but Linux and Windows both =
have this functionality that VMWare will use if present (and the guest =
tools know about it). Linux goes a step further and ensures that it's =
not in the middle of writing anything to swap during the quiesce period, =
too. I don't think this would be terribly difficult to implement, a hook =
somewhere along the write chain that blocks (or queues up) anything =
trying to write until the unfreeze comes along, but I'm guessing there =
are all sorts of deadlock opportunities here.

Indeed sync(8) has the disadvantage that you cannot prevent writes =
between the syscall and the EBS snapshot, so depending on the =
application, this can make the resulting EBS snapshot useless.

But taking a zfs snapshot is an atomic operation. Why not use that? For =
example:

1. snapshot the zfs at the same point in time you'd issue that ioctl on =
Linux
2. take the EBS snapshot at any time
3. clone the EBS snapshot to the new/other VM
4. zfs import the pool there
5. zfs rollback the filesystem to the snapshot taken in step 1 (or clone =
it and use that)

Any writes that have been issued between the zfs snapshot and the EBS =
snapshot are discarded, and like that you get the exact same filesystem =
data as you would have gotten with ioctl. Also, taking the zfs snapshot =
should take much less time, because you don't have to wait for the EBS =
snapshot to complete before you can resume IO on the filesystem. So you =
don't even depend on EBS snapshots being quick when using the zfs =
approach, a big advantage in my opinion.


Markus