Date: Thu, 30 Jul 2015 16:41:20 +0100 From: Steven Hartland <killing@multiplay.co.uk> To: freebsd-fs@freebsd.org Subject: Re: ZFS on 10-STABLE r281159: programs, accessing ZFS pauses for minutes in state [*kmem arena] Message-ID: <55BA45A0.508@multiplay.co.uk> In-Reply-To: <26DA7547-3258-44CC-A3EA-338AFA13640E@kraus-haus.org> References: <164833736.20150730143008@serebryakov.spb.ru> <55BA0F41.6070508@multiplay.co.uk> <26DA7547-3258-44CC-A3EA-338AFA13640E@kraus-haus.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 30/07/2015 15:41, Paul Kraus wrote: > On Jul 30, 2015, at 7:49, Steven Hartland <killing@multiplay.co.uk> wrote: > >> On 30/07/2015 12:30, Lev Serebryakov wrote: >>> Deduplication IS TURNED OFF. atime is turned off. Record size set to 1M as >>> I have a lot of big files (movies, RAW photo from DSLR, etc). Compression is >>> turned off. >> You don't need to do that as record set size is a min not a max, if you don't force it large files will still be stored efficiently. > Can you point to documentation for that ? Ignore my previous comment there I was clearly having a special moment. recordsize sets the suggested block size which is effectively the largest block size for a given file. Its generally not about efficient storage more efficient access, so that's what you usually want to consider except in extreme cases. If you set recordsize to 1MB you get large block support which is detailed here: https://reviews.csiden.org/r/51/ Key info from this: Recommended uses center around improving performance of random reads of large blocks (>= 128KB): - files that are randomly read in large chunks (e.g. video files when streaming many concurrent streams such that prefetch can not effectively cache data); performance will be improved in this case because random 1MB reads from rotating disks has higher bandwidth than random 128KB reads. - typically, performance of scrub/resilver is improved, especially with RAID-Z The tradeoffs to consider when using large blocks include: - accessing large blocks tends to increase latency of all operations, because even small reads will need to get in line benind large reads/writes - sub-block writes (i.e. write to 128KB of a 1MB block) will incur even larger read-modify-write penalty - the last, partially-filled block of each file will be larger, wasting memory, and if compression is not enabled, disk space (expected waste is 1/2 the recordsize per file, assuming random file length) recordsize is documented in the man page: https://www.freebsd.org/cgi/man.cgi?query=zfs&apropos=0&sektion=8&manpath=FreeBSD+10.2-stable&arch=default&format=html > I really hope that the 128KB default is not a minimum record size or a 1KB file will take up 128 KB of FS space. Setting the recordsize sets the suggested block size used so if you set 1MB then the minimum size a file can occupy is 1MB even if its on a 512b file. > As far as I know, zfs recordsize has always, since the very beginning of ZFS under Solaris, been the MAX recrodsize, but it is also a hint and not a fixed value. ZFS will write any size records (powers of 2) from 512 bytes (4 KB in the case of an shift = 4 pool) up to recordsize. Tuning of recordsize has been frowned upon since the beginning unless you _know_ the size of your writes and they are fixed (like 8 KB database records). > > Also note that ZFS will fit the write to the pool in the case of RAIDz<n>, see Matt Ahrens bloig entry here: http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ Another nice article on this can be found here: https://www.joyent.com/blog/bruning-questions-zfs-record-size Regards Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55BA45A0.508>