Date: Fri, 13 May 2011 09:13:50 -0500 (CDT) From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us> To: Freddie Cash <fjwcash@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. Message-ID: <alpine.GSO.2.01.1105130848030.20825@freddy.simplesystems.org> In-Reply-To: <BANLkTi=b%2BwA-ADup9SQvykexJBZwjK9WZw@mail.gmail.com> References: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca> <alpine.GSO.2.01.1105121805500.8019@freddy.simplesystems.org> <BANLkTi=b%2BwA-ADup9SQvykexJBZwjK9WZw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 12 May 2011, Freddie Cash wrote: >> >> Zfs would certainly appreciate 128K since that is its default block size. > > Note: the "default block size" is a max block size, not an "every > block written is this size" setting. A ZFS filesystem will use any > power-of-2 size under the block size setting for that filesystem. Except for file tail blocks, or when compression/encrpytion is used, zfs will write full blocks as is configured for the filesystem being written to (the current setting when the file was originally created). Even with compression/encrpytion enabled, the input (uncompressed) data size is the configured block size. The block needs to be read, and (possibly) decompressed, and (possibly) decrypted so that it can be checksummed, and any changes made. The checksum is based on the decoded block in order to capture as many potential error cases as possible, and so that the zfs "send" stream can use the same checksums. Zfs writes data in large transaction groups ("TXG") which allows it to buffer quite a lot of update data (up to 5 seconds worth) before anything is actually written. Even if the application should write 16kb at a time, zfs is likely to have buffered many times 128kb by the time the next TXG is written. If zfs goes to write a block and the user has supplied less than the block size, and the file data has not been accessed for a long time, or the system is under memory pressure so the file data is no longer cached, then zfs needs to read (which includes checksum validation, and possibly decompression and deencryption) the existing block content so that it can fill in the gaps since it always writes full blocks. The blocks are written using a Copy On Write ("COW") algorithm so that the block is written to a new block location. If the NFS client conveniently sent the data 128K at a time for sequential writes then there is a better chance that zfs will be able to avoid some heavy lifting. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.GSO.2.01.1105130848030.20825>