From owner-freebsd-fs@FreeBSD.ORG Fri May 13 14:13:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67DCF1065673 for ; Fri, 13 May 2011 14:13:52 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 0EA6A8FC15 for ; Fri, 13 May 2011 14:13:51 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p4DEDoDQ022145; Fri, 13 May 2011 09:13:50 -0500 (CDT) Date: Fri, 13 May 2011 09:13:50 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Freddie Cash In-Reply-To: Message-ID: References: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Fri, 13 May 2011 09:13:50 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 14:13:52 -0000 On Thu, 12 May 2011, Freddie Cash wrote: >> >> Zfs would certainly appreciate 128K since that is its default block size. > > Note: the "default block size" is a max block size, not an "every > block written is this size" setting. A ZFS filesystem will use any > power-of-2 size under the block size setting for that filesystem. Except for file tail blocks, or when compression/encrpytion is used, zfs will write full blocks as is configured for the filesystem being written to (the current setting when the file was originally created). Even with compression/encrpytion enabled, the input (uncompressed) data size is the configured block size. The block needs to be read, and (possibly) decompressed, and (possibly) decrypted so that it can be checksummed, and any changes made. The checksum is based on the decoded block in order to capture as many potential error cases as possible, and so that the zfs "send" stream can use the same checksums. Zfs writes data in large transaction groups ("TXG") which allows it to buffer quite a lot of update data (up to 5 seconds worth) before anything is actually written. Even if the application should write 16kb at a time, zfs is likely to have buffered many times 128kb by the time the next TXG is written. If zfs goes to write a block and the user has supplied less than the block size, and the file data has not been accessed for a long time, or the system is under memory pressure so the file data is no longer cached, then zfs needs to read (which includes checksum validation, and possibly decompression and deencryption) the existing block content so that it can fill in the gaps since it always writes full blocks. The blocks are written using a Copy On Write ("COW") algorithm so that the block is written to a new block location. If the NFS client conveniently sent the data 128K at a time for sequential writes then there is a better chance that zfs will be able to avoid some heavy lifting. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/