Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Jul 2013 10:59:07 +0200
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        Kevin Day <toasty@dragondata.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze?
Message-ID:  <14A2336A-969C-4A13-9EFA-C0C42A12039F@hostpoint.ch>
In-Reply-To: <A5A66641-5EF9-454E-A767-009480EE404E@dragondata.com>
References:  <87li5o5tz2.wl%berend@pobox.com> <CA%2BtpaK1jQuKneQsxkVfxJGzXdPdLZfqBM1QWQ0e19nK5t71t1Q@mail.gmail.com> <87ehbg5raq.wl%berend@pobox.com> <20130703055047.GA54853@icarus.home.lan> <6488DECC-2455-4E92-B432-C39490D18484@dragondata.com> <CADBaqmihCB5JP01hLwXTWHoZiJJ5-jkT-Ro=oDwOcKZT_zvEKA@mail.gmail.com> <A5A66641-5EF9-454E-A767-009480EE404E@dragondata.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 03.07.2013, at 09:02, Kevin Day <toasty@dragondata.com> wrote:

>=20
> On Jul 3, 2013, at 1:53 AM, Will Andrews <will@firepipe.net> wrote:
>=20
>> On Wednesday, July 3, 2013, Kevin Day wrote:
>> The closest thing we can do in FreeBSD is to unmount the filesystem, =
take the snapshot, and remount. This has the side effect of closing all =
open files, so it's not really an alternative.
>>=20
>> The other option is to not freeze the filesystem before taking the =
snapshot, but again you risk leaving things in an inconsistent state, =
and/or the last few writes you think you made didn't actually get =
committed to disk yet. For automated systems that create then clone =
filesystems for new VMs, this can be a big problem. At best, you're =
going to get a warning that the filesystem wasn't cleanly unmounted.
>>=20
>> Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause =
I/O running in other contexts, but it does guarantee that any commands =
you ran and completed prior to calling sync will make it to disk in ZFS.
>>=20
>> This is because sync in ZFS is implemented as a ZIL commit, so =
transactions that haven't yet made it to disk via the normal syncing =
context will at least be committed via their ZIL blocks. Which can then =
be replayed when the pool is imported later, in this case from the EBS =
snapshots.
>>=20
>> And since the entire tree from the =FCberblock down in ZFS is COW, =
you can't get an inconsistent pool simply by doing a virtual disk =
snapshot, regardless of how that is implemented.
>>=20
>> --Will.
>=20
> Sorry, yes, this is true. We're not using ZFS to clone and provision =
new VMs, so I was just thinking about UFS here. And ZFS does have a good =
advantage here that it seems to actually respect sync requests. I think =
it was here I reported a few months ago that we were seeing UFS+SUJ not =
actually doing anything when sync(8) was called.
>=20
> But for some workloads this still isn't sufficient if you have =
processes running that could be writing at any time. As an example, we =
have a database server using ZFS backed storage. Short of shutting down =
the server, there's no way to guarantee it won't try to write even if we =
lock all tables, disconnect all clients, etc. mysql has all sorts of =
things done on timers that occur lazily in the future, including =
periodic checkpoint writes even if there is no activity.
>=20
> I know this is a sort of obscure use case, but Linux and Windows both =
have this functionality that VMWare will use if present (and the guest =
tools know about it). Linux goes a step further and ensures that it's =
not in the middle of writing anything to swap during the quiesce period, =
too. I don't think this would be terribly difficult to implement, a hook =
somewhere along the write chain that blocks (or queues up) anything =
trying to write until the unfreeze comes along, but I'm guessing there =
are all sorts of deadlock opportunities here.

Indeed sync(8) has the disadvantage that you cannot prevent writes =
between the syscall and the EBS snapshot, so depending on the =
application, this can make the resulting EBS snapshot useless.

But taking a zfs snapshot is an atomic operation. Why not use that? For =
example:

1. snapshot the zfs at the same point in time you'd issue that ioctl on =
Linux
2. take the EBS snapshot at any time
3. clone the EBS snapshot to the new/other VM
4. zfs import the pool there
5. zfs rollback the filesystem to the snapshot taken in step 1 (or clone =
it and use that)

Any writes that have been issued between the zfs snapshot and the EBS =
snapshot are discarded, and like that you get the exact same filesystem =
data as you would have gotten with ioctl. Also, taking the zfs snapshot =
should take much less time, because you don't have to wait for the EBS =
snapshot to complete before you can resume IO on the filesystem. So you =
don't even depend on EBS snapshots being quick when using the zfs =
approach, a big advantage in my opinion.


Markus




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14A2336A-969C-4A13-9EFA-C0C42A12039F>