From owner-freebsd-fs@FreeBSD.ORG Fri Nov 29 06:00:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5486E97A for ; Fri, 29 Nov 2013 06:00:40 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 113C21F87 for ; Fri, 29 Nov 2013 06:00:40 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id rAT60aff046648; Thu, 28 Nov 2013 22:00:36 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201311290600.rAT60aff046648@chez.mckusick.com> To: Konstantin Belousov Subject: Re: RFC: NFS client patch to reduce sychronous writes In-reply-to: <20131128071821.GH59496@kib.kiev.ua> Date: Thu, 28 Nov 2013 22:00:36 -0800 From: Kirk McKusick Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2013 06:00:40 -0000 > Date: Thu, 28 Nov 2013 09:18:21 +0200 > From: Konstantin Belousov > To: Kirk McKusick > Cc: Rick Macklem , FreeBSD FS > Subject: Re: RFC: NFS client patch to reduce sychronous writes > > On Wed, Nov 27, 2013 at 03:20:14PM -0800, Kirk McKusick wrote: >> The ``fix'' of bzero'ing every buffer cache page was made to UFS/FFS >> for this problem and it killed write performance of the filesystem >> by nearly half. We corrected this by only doing the bzero when the >> file is mmap'ed which helped things considerably (since most files >> being written are not also bmap'ed). > > I am not sure that I follow. > > For UFS, leaving any part of the buffer with undefined garbage would > cause the garbage to appear on the next mmap(2), since page in is > implemented as translation of the file offsets into disk offsets and > than reading disk blocks. The read always fetch full page. UFS cannot > know if the file would be mapped sometime in future, or after the > reboot. > > In fact, UFS is quite plentiful WRT zeroing buffers on write. It is easy > to see almost all places where it is done, by searching for BA_CLRBUF > flag for UFS_BALLOC(). UFS does perform the optimization of _trying_ to > not clear newly allocated buffer on write if uio covers the whole buffer > range. Still, on error it falls back to clearing, which is performed by > vfs_bio_clrbuf() call in ffs_write(). You are entirely correct in your analysis. The original "fix" was to always clear every buffer even when it was being completely filled (which is the most common case). I changed the filling completely case to first try the copyin and only zeroing it when the copyin fails. Making that change nearly doubled the the speed of bulk writes. ~Kirk