Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 May 2011 09:13:50 -0500 (CDT)
From:      Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
To:        Freddie Cash <fjwcash@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: How to enable cache and logs.
Message-ID:  <alpine.GSO.2.01.1105130848030.20825@freddy.simplesystems.org>
In-Reply-To: <BANLkTi=b%2BwA-ADup9SQvykexJBZwjK9WZw@mail.gmail.com>
References:  <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca> <alpine.GSO.2.01.1105121805500.8019@freddy.simplesystems.org> <BANLkTi=b%2BwA-ADup9SQvykexJBZwjK9WZw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 12 May 2011, Freddie Cash wrote:
>>
>> Zfs would certainly appreciate 128K since that is its default block size.
>
> Note:  the "default block size" is a max block size, not an "every
> block written is this size" setting.  A ZFS filesystem will use any
> power-of-2 size under the block size setting for that filesystem.

Except for file tail blocks, or when compression/encrpytion is used, 
zfs will write full blocks as is configured for the filesystem being 
written to (the current setting when the file was originally created). 
Even with compression/encrpytion enabled, the input (uncompressed) 
data size is the configured block size.  The block needs to be read, 
and (possibly) decompressed, and (possibly) decrypted so that it can 
be checksummed, and any changes made.  The checksum is based on the 
decoded block in order to capture as many potential error cases as 
possible, and so that the zfs "send" stream can use the same 
checksums.

Zfs writes data in large transaction groups ("TXG") which allows it to 
buffer quite a lot of update data (up to 5 seconds worth) before 
anything is actually written.  Even if the application should write 
16kb at a time, zfs is likely to have buffered many times 128kb by the 
time the next TXG is written.

If zfs goes to write a block and the user has supplied less than the 
block size, and the file data has not been accessed for a long time, 
or the system is under memory pressure so the file data is no longer 
cached, then zfs needs to read (which includes checksum validation, 
and possibly decompression and deencryption) the existing block 
content so that it can fill in the gaps since it always writes full 
blocks.  The blocks are written using a Copy On Write ("COW") 
algorithm so that the block is written to a new block location.  If 
the NFS client conveniently sent the data 128K at a time for 
sequential writes then there is a better chance that zfs will be able 
to avoid some heavy lifting.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.GSO.2.01.1105130848030.20825>