Date: Thu, 24 Sep 2015 15:05:49 -0400 From: Paul Kraus <paul@kraus-haus.org> To: Quartz <quartz@sneakertech.com> Cc: FreeBSD questions <freebsd-questions@freebsd.org> Subject: Re: sync vs async vs zfs Message-ID: <98BFE313-523F-4A2C-82BB-8683466068FB@kraus-haus.org> In-Reply-To: <56042774.6070404@sneakertech.com> References: <56042774.6070404@sneakertech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sep 24, 2015, at 12:40, Quartz <quartz@sneakertech.com> wrote: > I'm trying to spec out a new system that looks like it might be very = sensitive to sync vs async writes. However, after some research and = investigation I've come to realize that I don't think I understand = a/sync as well as I thought I did and might be confused about some of = the fundamentals. Very short answer=85 Both terms refer to writes only, there is no such thing as a sync or = async read. In the case of an async write, the application code (App) asks the = Filesystem (FS) to write some data. The FS is free to do whatever it = wants with the data and respond immediately that is has the data and it = _will_ write it to non-volatile (NV) storage (disk). In the case of a sync write (at least as defined by Posix), the App asks = the FS to write some data and do not return until it is committed to NV = storage. The FS is required (by Posix) to _not_ acknowledge the write = until the data _has_ been committed to NV storage. So in the first case, the FS can accept the data, put it in it=92s = =93write cache=94, typically RAM, and respond to the App that the write = is complete. When the FS has the time it then commits the data to NV = storage. If the system crashes after the App has =93written=94 the data = but before the FS has committed it to NV storage, that data is lost. In the second case, the FS _must_not_ respond to the APP until the data = is committed to NV storage. The App can be certain that the data is = safe. This is critical for, among other things, databases processing = transactions in specific order or time. > Can someone point me to a good "newbie's guide" that explains sync vs = async from the ground up? one that makes no assumptions about prior = knowledge of filesystems and IO. And likewise, another guide = specifically for how they relate to zfs pool/vdev configuration? I don=92t know of a basic guide to this, I just learned it from various = places over 20 years in the business. In terms of ZFS, the ARC acts as both write buffer and read cache. You = can see this easily when running benchmarks such as iozone with files = smaller than the amount of RAM. When making an async write call the FS = responds almost immediately and you are measuring the efficiency of the = ZFS code and memory bandwidth :-) I have seen write performance in the = 10=92s of GB/sec on drives that I know do not have that kind of = bandwidth. Make the ARC too small to hold the entire file or make the = file too big to fit you start seeing the performance of the drives. This = is due (in part) to the TXG design of ZFS. You can watch the drives (via = iostat -x) and see ZFS committing data in bursts (originally up to 30 = seconds apart, now up to 5 seconds apart). Now when you issue a sync write to ZFS, in order to adhere to Posix = requirements, ZFS _must_ commit the data to NV storage before returning = an acknowledgement to the App. So ZFS has the ZIL (ZFS Intent Log). All = sync writes are committed to the ZIL immediately and then incorporated = into the dataset itself as TXGs commit. The ZIL is just space stolen = from the zpool _unless_ you have a Separate Log Device (SLOG), which is = just a special type of vdev (like spare) and is listed as =93log=94 in a = zpool status. By having a SLOG you can do two things, 1) ZFS no longer = needs to steal space from the dataset for the ZIL, so the dataset will = be much less fragmented and 2) you can use a device which is much faster = than the main zpool devices (like a ZeusRAM or fast SSD) and greatly = speed up sync writes. You can see the performance difference between async and sync using = iozone with the -o option. =46rom the iozone manage: "Writes are = synchronously written to disk. (O_SYNC). Iozone will open the files = with the O_SYNC flag. This forces all writes to the file to go = completely to disk before returning to the benchmark.=94 I hope this gets you started =85 -- Paul Kraus paul@kraus-haus.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?98BFE313-523F-4A2C-82BB-8683466068FB>