Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Sep 2015 15:05:49 -0400
From:      Paul Kraus <paul@kraus-haus.org>
To:        Quartz <quartz@sneakertech.com>
Cc:        FreeBSD questions <freebsd-questions@freebsd.org>
Subject:   Re: sync vs async vs zfs
Message-ID:  <98BFE313-523F-4A2C-82BB-8683466068FB@kraus-haus.org>
In-Reply-To: <56042774.6070404@sneakertech.com>
References:  <56042774.6070404@sneakertech.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sep 24, 2015, at 12:40, Quartz <quartz@sneakertech.com> wrote:

> I'm trying to spec out a new system that looks like it might be very =
sensitive to sync vs async writes. However, after some research and =
investigation I've come to realize that I don't think I understand =
a/sync as well as I thought I did and might be confused about some of =
the fundamentals.

Very short answer=85

Both terms refer to writes only, there is no such thing as a sync or =
async read.

In the case of an async write, the application code (App) asks the =
Filesystem (FS) to write some data. The FS is free to do whatever it =
wants with the data and respond immediately that is has the data and it =
_will_ write it to non-volatile (NV) storage (disk).

In the case of a sync write (at least as defined by Posix), the App asks =
the FS to write some data and do not return until it is committed to NV =
storage. The FS is required (by Posix) to _not_ acknowledge the write =
until the data _has_ been committed to NV storage.

So in the first case, the FS can accept the data, put it in it=92s =
=93write cache=94, typically RAM, and respond to the App that the write =
is complete. When the FS has the time it then commits the data to NV =
storage. If the system crashes after the App has =93written=94 the data =
but before the FS has committed it to NV storage, that data is lost.

In the second case, the FS _must_not_ respond to the APP until the data =
is committed to NV storage. The App can be certain that the data is =
safe. This is critical for, among other things, databases processing =
transactions in specific order or time.

> Can someone point me to a good "newbie's guide" that explains sync vs =
async from the ground up? one that makes no assumptions about prior =
knowledge of filesystems and IO. And likewise, another guide =
specifically for how they relate to zfs pool/vdev configuration?

I don=92t know of a basic guide to this, I just learned it from various =
places over 20 years in the business.

In terms of ZFS, the ARC acts as both write buffer and read cache. You =
can see this easily when running benchmarks such as iozone with files =
smaller than the amount of RAM. When making an async write call the FS =
responds almost immediately and you are measuring the efficiency of the =
ZFS code and memory bandwidth :-) I have seen write performance in the =
10=92s of GB/sec on drives that I know do not have that kind of =
bandwidth. Make the ARC too small to hold the entire file or make the =
file too big to fit you start seeing the performance of the drives. This =
is due (in part) to the TXG design of ZFS. You can watch the drives (via =
iostat -x) and see ZFS committing data in bursts (originally up to 30 =
seconds apart, now up to 5 seconds apart).

Now when you issue a sync write to ZFS, in order to adhere to Posix =
requirements, ZFS _must_ commit the data to NV storage before returning =
an acknowledgement to the App. So ZFS has the ZIL (ZFS Intent Log). All =
sync writes are committed to the ZIL immediately and then incorporated =
into the dataset itself as TXGs commit. The ZIL is just space stolen =
from the zpool _unless_ you have a Separate Log Device (SLOG), which is =
just a special type of vdev (like spare) and is listed as =93log=94 in a =
zpool status. By having a SLOG you can do two things, 1) ZFS no longer =
needs to steal space from the dataset for the ZIL, so the dataset will =
be much less fragmented and 2) you can use a device which is much faster =
than the main zpool devices (like a ZeusRAM or fast SSD) and greatly =
speed up sync writes.

You can see the performance difference between async and sync using =
iozone with the -o option. =46rom the iozone manage: "Writes are =
synchronously written to disk. (O_SYNC).  Iozone will open the files =
with the O_SYNC flag. This forces all  writes  to the file to go =
completely to disk before returning to the benchmark.=94

I hope this gets you started =85

--
Paul Kraus
paul@kraus-haus.org




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?98BFE313-523F-4A2C-82BB-8683466068FB>