From owner-freebsd-questions@freebsd.org Thu Sep 24 19:05:59 2015 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B2EFA083D3 for ; Thu, 24 Sep 2015 19:05:59 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: from mail-qg0-f53.google.com (mail-qg0-f53.google.com [209.85.192.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 966331275 for ; Thu, 24 Sep 2015 19:05:58 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: by qgx61 with SMTP id 61so52423929qgx.3 for ; Thu, 24 Sep 2015 12:05:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=HjXqVzYNdrnQi3pTA0Yz+UTBzBqzvA85ZpbE42h5FOM=; b=hv+hWEVjtqedTtzgETyDB4hXeIg3962u/77FPbHtWa4FK8Yy/3wDKelhvaa6gEiXiG 300cibjD3Lo4LQVKORsP162HIrGxiStaACq4muXEtbL6Anb7nVTKe29naM7rl/ISZGzf 5gKqWu7F5VU+Dtjn6NFmFim8I+YNpDCqfT6mJ7IoEG97BahMTOPTzvULutOgkV7ZfXoD YBCu+pvP8qtrnEggdLI9E1kOcleooA7aWk+x66Kg2ANYwkxYio2V9gxIC/+dZx+hEvFq laEhrbycctdpkc/49C2wtuAoFzECB/zS2+blamtcpxbHck8htE+B3G1+5IS58+Ga4ZFa qusw== X-Gm-Message-State: ALoCoQm/Oczu6aX4j0+CvGvO+8smKB+H/aIcybpia2gfDww0TuIOVCnV6f+jtRJS+/aRnh2M69st X-Received: by 10.140.151.76 with SMTP id 73mr1735917qhx.61.1443121557438; Thu, 24 Sep 2015 12:05:57 -0700 (PDT) Received: from mbp-1.thecreativeadvantage.com (mail.thecreativeadvantage.com. [96.236.20.34]) by smtp.gmail.com with ESMTPSA id 200sm4998590qhh.26.2015.09.24.12.05.54 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 24 Sep 2015 12:05:55 -0700 (PDT) Subject: Re: sync vs async vs zfs Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Content-Type: text/plain; charset=windows-1252 From: Paul Kraus In-Reply-To: <56042774.6070404@sneakertech.com> Date: Thu, 24 Sep 2015 15:05:49 -0400 Cc: FreeBSD questions Content-Transfer-Encoding: quoted-printable Message-Id: <98BFE313-523F-4A2C-82BB-8683466068FB@kraus-haus.org> References: <56042774.6070404@sneakertech.com> To: Quartz X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 19:05:59 -0000 On Sep 24, 2015, at 12:40, Quartz wrote: > I'm trying to spec out a new system that looks like it might be very = sensitive to sync vs async writes. However, after some research and = investigation I've come to realize that I don't think I understand = a/sync as well as I thought I did and might be confused about some of = the fundamentals. Very short answer=85 Both terms refer to writes only, there is no such thing as a sync or = async read. In the case of an async write, the application code (App) asks the = Filesystem (FS) to write some data. The FS is free to do whatever it = wants with the data and respond immediately that is has the data and it = _will_ write it to non-volatile (NV) storage (disk). In the case of a sync write (at least as defined by Posix), the App asks = the FS to write some data and do not return until it is committed to NV = storage. The FS is required (by Posix) to _not_ acknowledge the write = until the data _has_ been committed to NV storage. So in the first case, the FS can accept the data, put it in it=92s = =93write cache=94, typically RAM, and respond to the App that the write = is complete. When the FS has the time it then commits the data to NV = storage. If the system crashes after the App has =93written=94 the data = but before the FS has committed it to NV storage, that data is lost. In the second case, the FS _must_not_ respond to the APP until the data = is committed to NV storage. The App can be certain that the data is = safe. This is critical for, among other things, databases processing = transactions in specific order or time. > Can someone point me to a good "newbie's guide" that explains sync vs = async from the ground up? one that makes no assumptions about prior = knowledge of filesystems and IO. And likewise, another guide = specifically for how they relate to zfs pool/vdev configuration? I don=92t know of a basic guide to this, I just learned it from various = places over 20 years in the business. In terms of ZFS, the ARC acts as both write buffer and read cache. You = can see this easily when running benchmarks such as iozone with files = smaller than the amount of RAM. When making an async write call the FS = responds almost immediately and you are measuring the efficiency of the = ZFS code and memory bandwidth :-) I have seen write performance in the = 10=92s of GB/sec on drives that I know do not have that kind of = bandwidth. Make the ARC too small to hold the entire file or make the = file too big to fit you start seeing the performance of the drives. This = is due (in part) to the TXG design of ZFS. You can watch the drives (via = iostat -x) and see ZFS committing data in bursts (originally up to 30 = seconds apart, now up to 5 seconds apart). Now when you issue a sync write to ZFS, in order to adhere to Posix = requirements, ZFS _must_ commit the data to NV storage before returning = an acknowledgement to the App. So ZFS has the ZIL (ZFS Intent Log). All = sync writes are committed to the ZIL immediately and then incorporated = into the dataset itself as TXGs commit. The ZIL is just space stolen = from the zpool _unless_ you have a Separate Log Device (SLOG), which is = just a special type of vdev (like spare) and is listed as =93log=94 in a = zpool status. By having a SLOG you can do two things, 1) ZFS no longer = needs to steal space from the dataset for the ZIL, so the dataset will = be much less fragmented and 2) you can use a device which is much faster = than the main zpool devices (like a ZeusRAM or fast SSD) and greatly = speed up sync writes. You can see the performance difference between async and sync using = iozone with the -o option. =46rom the iozone manage: "Writes are = synchronously written to disk. (O_SYNC). Iozone will open the files = with the O_SYNC flag. This forces all writes to the file to go = completely to disk before returning to the benchmark.=94 I hope this gets you started =85 -- Paul Kraus paul@kraus-haus.org