From owner-freebsd-fs@freebsd.org Thu Jul 30 15:41:19 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B0C359AE406 for ; Thu, 30 Jul 2015 15:41:19 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6CD391156 for ; Thu, 30 Jul 2015 15:41:19 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by wicmv11 with SMTP id mv11so26200252wic.0 for ; Thu, 30 Jul 2015 08:41:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=bKUa3WsjGS9w6kLWWBeURWG90n3j6gW8zrK83XQ3v/8=; b=g4EDnQZ1/OImqAqZ27Q92WWdcYHKzSM4tuCIWiHn5G5M7evi27VohVvUT1mGtXxGZe hmbvtApXjUprCzcvL7O/1Ec7F5sXimn1c1eZKZMWfid3N4TQ7FUc//9OgfFkH9vdZaA4 bbrvBGjWTs0lW4vO640aibIgwStKHZlMTECFIf8Ft0cCVS1VGqGig4byfiqYXH8LHzOG E1Z4MGVvHz+nDKLATlkOVwS934NHAyxzR76jqa7fCqh3IMCqClojTBNkg+pksZqeJP2a thfufPBIGBWZYSui3is5wbPpxjUv2/qCsxqQfsqV3i1yN+mJldFYE5imgwxyRqMZyAls yfVQ== X-Gm-Message-State: ALoCoQncMWsK/3/V2lSLsHcDHETvbEH5zWr5eWWitQCSCNTk4boVqF6SHXOeJqCN0VtQoXfUF1Xg X-Received: by 10.194.23.194 with SMTP id o2mr92368120wjf.63.1438270877785; Thu, 30 Jul 2015 08:41:17 -0700 (PDT) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by smtp.gmail.com with ESMTPSA id uo6sm2485990wjc.1.2015.07.30.08.41.17 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Jul 2015 08:41:17 -0700 (PDT) Subject: Re: ZFS on 10-STABLE r281159: programs, accessing ZFS pauses for minutes in state [*kmem arena] To: freebsd-fs@freebsd.org References: <164833736.20150730143008@serebryakov.spb.ru> <55BA0F41.6070508@multiplay.co.uk> <26DA7547-3258-44CC-A3EA-338AFA13640E@kraus-haus.org> From: Steven Hartland Message-ID: <55BA45A0.508@multiplay.co.uk> Date: Thu, 30 Jul 2015 16:41:20 +0100 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <26DA7547-3258-44CC-A3EA-338AFA13640E@kraus-haus.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jul 2015 15:41:19 -0000 On 30/07/2015 15:41, Paul Kraus wrote: > On Jul 30, 2015, at 7:49, Steven Hartland wrote: > >> On 30/07/2015 12:30, Lev Serebryakov wrote: >>> Deduplication IS TURNED OFF. atime is turned off. Record size set to 1M as >>> I have a lot of big files (movies, RAW photo from DSLR, etc). Compression is >>> turned off. >> You don't need to do that as record set size is a min not a max, if you don't force it large files will still be stored efficiently. > Can you point to documentation for that ? Ignore my previous comment there I was clearly having a special moment. recordsize sets the suggested block size which is effectively the largest block size for a given file. Its generally not about efficient storage more efficient access, so that's what you usually want to consider except in extreme cases. If you set recordsize to 1MB you get large block support which is detailed here: https://reviews.csiden.org/r/51/ Key info from this: Recommended uses center around improving performance of random reads of large blocks (>= 128KB): - files that are randomly read in large chunks (e.g. video files when streaming many concurrent streams such that prefetch can not effectively cache data); performance will be improved in this case because random 1MB reads from rotating disks has higher bandwidth than random 128KB reads. - typically, performance of scrub/resilver is improved, especially with RAID-Z The tradeoffs to consider when using large blocks include: - accessing large blocks tends to increase latency of all operations, because even small reads will need to get in line benind large reads/writes - sub-block writes (i.e. write to 128KB of a 1MB block) will incur even larger read-modify-write penalty - the last, partially-filled block of each file will be larger, wasting memory, and if compression is not enabled, disk space (expected waste is 1/2 the recordsize per file, assuming random file length) recordsize is documented in the man page: https://www.freebsd.org/cgi/man.cgi?query=zfs&apropos=0&sektion=8&manpath=FreeBSD+10.2-stable&arch=default&format=html > I really hope that the 128KB default is not a minimum record size or a 1KB file will take up 128 KB of FS space. Setting the recordsize sets the suggested block size used so if you set 1MB then the minimum size a file can occupy is 1MB even if its on a 512b file. > As far as I know, zfs recordsize has always, since the very beginning of ZFS under Solaris, been the MAX recrodsize, but it is also a hint and not a fixed value. ZFS will write any size records (powers of 2) from 512 bytes (4 KB in the case of an shift = 4 pool) up to recordsize. Tuning of recordsize has been frowned upon since the beginning unless you _know_ the size of your writes and they are fixed (like 8 KB database records). > > Also note that ZFS will fit the write to the pool in the case of RAIDz, see Matt Ahrens bloig entry here: http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ Another nice article on this can be found here: https://www.joyent.com/blog/bruning-questions-zfs-record-size Regards Steve