Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Nov 2012 08:57:01 -0000
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Adam Nowacki" <nowakpl@platinum.linux.pl>, <freebsd-fs@freebsd.org>
Subject:   Re: ZFS FAQ (Was: SSD recommendations for ZFS cache/log)
Message-ID:  <230DE7DAE83749DCBD180D5EF85D4CB1@multiplay.co.uk>
References:  <CAFHbX1K-NPuAy5tW0N8=sJD=CU0Q1Pm3ZDkVkE%2BdjpCsD1U8_Q@mail.gmail.com> <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <50A31D48.3000700@shatow.net> <CAF6rxgkh6C0LKXOZa264yZcA3AvQdw7zVAzWKpytfh0%2BKnLOJg@mail.gmail.com> <57ac1f$gg70bn@ipmail05.adl6.internode.on.net> <CAF6rxgnjPJ=v24p%2BkOci2qGQ1weH7r%2B8vdLmiJ_1DrxLeEzZvg@mail.gmail.com> <CAF6rxg=wjy9KTtifGrF2D6szwWKw8cX-gkJjnZRQBmFTC9BBdg@mail.gmail.com> <48C81451-B9E7-44B5-8B8A-ED4B1D464EC6@bway.net> <50AB3789.1000508@platinum.linux.pl>

next in thread | previous in thread | raw e-mail | index | archive | help

----- Original Message ----- 
From: "Adam Nowacki" <nowakpl@platinum.linux.pl>
To: <freebsd-fs@freebsd.org>
Sent: Tuesday, November 20, 2012 7:55 AM
Subject: Re: ZFS FAQ (Was: SSD recommendations for ZFS cache/log)


> On 2012-11-20 05:59, Charles Sprickman wrote:
>> Wonderful to see some work on this.
>>
>> One of the great remaining zfs mysteries remains all the tunables
>> that are under "vfs.zfs.*".  Obviously there are plenty of read-only
>> items there, but conflicting information gathered from random forum
>> posts and commit messages exist about what exactly one can do
>> regarding tuning beyond arc sizing.
>>
>> If you have any opportunity to work with the people who have ported
>> and are now maintaining zfs, it would be really wonderful to get
>> some feedback from them on what knobs are safe to twiddle and why.
>> I suspect many of the tunable items don't really have meaningful
>> equivalents in Sun's implementation since the way zfs falls under
>> the vfs layer in FreeBSD is so different.
>>
>> Thanks,
>>
>> Charles
> 
> I'll share my experiences while tuning for home NAS: 
> vfs.zfs.write_limit_* is a mess.
> 6 sysctls work together to produce a single value - maximum size of txg 
> commit. If size of data yet to be stored on disk grows to this size a 
> txg commit will be forced, but there is a catch, this size is only an 
> estimate and absolutely worst case one at that - multiply by 24 (there 
> is a reason for this madness below). This means that writing a 1MB file 
> will result in 24MB estimated txg commit size (+ metadata). Back to the 
> sysctls:
> 
> # vfs.zfs.write_limit_override - if not 0 absolutely override write 
> limit (ignore other sysctls), if 0 then an internal dynamically computed 
> value is used based on:
> # vfs.zfs.txg.synctime_ms - adjust write limit based on previous txg 
> commits so the time to write is equal to this value in milliseconds 
> (basically estimates disks write bandwidth),
> # vfs.zfs.write_limit_shift - sets vfs.zfs.write_limit_max to ram size / 
> 2^write_limit_shift,
> # vfs.zfs.write_limit_max - used to derive vfs.zfs.write_limit_inflated 
> (multiply by 24), but only if vfs.zfs.write_limit_shift is not 0,
> # vfs.zfs.write_limit_inflated - maximum size of the dynamic write limit,
> # vfs.zfs.write_limit_min - minimum size of the dynamic write limit,
> and to have the whole picture:
> # vfs.zfs.txg.timeout - force txg commit every this many seconds if it 
> didn't happen by write limit.
> 
> For my home NAS (10x 2TB disks encrypted with geli in raidz2, cpu with 
> hw aes, 16GB ram, 2x 1GE for samba and iSCSI with MCS) I have ended with:
> 
> /boot/loader.conf:
> vfs.zfs.write_limit_shift="4" # 16GB ram / 2^4 = 1GB limit
> vfs.zfs.write_limit_min="2400M" # 100MB minimum multiplied by the 24 
> factor, during heavy read-write operations dynamic write limit would 
> enter positive feedback loop and reduce write limit too much
> vfs.zfs.txg.synctime_ms="2000" # try to maintain 2 seconds commit time 
> during large writes
> vfs.zfs.txg.timeout="120" # 2 minutes to reduce fragmentation and wear 
> from small writes, worst case scenario 2 minutes of asynchronous writes 
> is lost, synchronous end in ZIL anyway
> 
> and for completness:
> 
> vfs.zfs.arc_min="10000M"
> vfs.zfs.arc_max="10000M"
> vfs.zfs.vdev.cache.size="16M" # vdev cache helps a lot during scrubs
> vfs.zfs.vdev.cache.bshift="14" # grow all i/o requests to 16kiB, smaller 
> have shown to have same latency so might as well get more "for free"
> vfs.zfs.vdev.cache.max="16384"

This has been disabled by default for a while are you sure of the benefits?

"Disable vdev cache (readahead) by default.

The vdev cache is very underutilized (hit ratio 30%-70%) and may consume
excessive memory on systems with many vdevs.

Illumos-gate revision:	13346"


> vfs.zfs.vdev.write_gap_limit="0"
> vfs.zfs.vdev.read_gap_limit="131072"
> vfs.zfs.vdev.aggregation_limit="131072" # group smaller reads into one 
> larger, benchmarking shown no appreciable latency increase while again 
> getting more bytes
> vfs.zfs.vdev.min_pending="1"
> vfs.zfs.vdev.max_pending="1" # seems to help txg commit bandwidth by 
> reducing seeking with parallel reads (not fully tested)
> 
> and a reason for 24 factor (4 * 3 * 2, from the code):
>     /*
>      * The worst case is single-sector max-parity RAID-Z blocks, in which
>      * case the space requirement is exactly (VDEV_RAIDZ_MAXPARITY + 1)
>      * times the size; so just assume that.  Add to this the fact that
>      * we can have up to 3 DVAs per bp, and one more factor of 2 because
>      * the block may be dittoed with up to 3 DVAs by ddt_sync().
>      */
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?230DE7DAE83749DCBD180D5EF85D4CB1>