From owner-freebsd-fs@freebsd.org Mon Jan 4 21:10:43 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9790AA61CF6 for ; Mon, 4 Jan 2016 21:10:43 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 67DFA1154 for ; Mon, 4 Jan 2016 21:10:42 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id u04LAM4H015797; Mon, 4 Jan 2016 15:10:23 -0600 (CST) Date: Mon, 4 Jan 2016 15:10:22 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "Mikhail T." cc: freebsd-fs@freebsd.org Subject: Re: NFS reads vs. writes In-Reply-To: <568A047B.1010000@aldan.algebra.com> Message-ID: References: <8291bb85-bd01-4c8c-80f7-2adcf9947366@email.android.com> <5688D3C1.90301@aldan.algebra.com> <495055121.147587416.1451871433217.JavaMail.zimbra@uoguelph.ca> <568A047B.1010000@aldan.algebra.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 04 Jan 2016 15:10:23 -0600 (CST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jan 2016 21:10:43 -0000 On Mon, 4 Jan 2016, Mikhail T. wrote: > Yes, indeed. Disabling sync got the writing throughput all the way up to > about 86Mb/s... I still don't fully understand, why local writes are > able to achieve this speed without async and without being considered > dangerous. Local writes are buffered to RAM and the current set of changes (many writes may have been obviated by overwrites) are written at all once as part of the next ZFS transaction group, which can take up to 5 seconds to occur. Each transaction group completes after all disks have positively acknowledged a cache flush. Using this approach, the on-disk data is coherent but it is possible to lose up to 5 seconds of data (back to the previous commited transaction group). The zfs intent log (slog) remembers the pending synchronous writes (which are still written into RAM!) and marks them as committed when the transaction group completes. If the server loses power or spontaneously reboots, the pending writes from the intent log are written to pool disks when the server comes up. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/