From owner-freebsd-questions@freebsd.org Tue Jun 25 18:19:07 2019 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64FE815D3375 for ; Tue, 25 Jun 2019 18:19:07 +0000 (UTC) (envelope-from mike@sentex.net) Received: from pyroxene.sentex.ca (unknown [IPv6:2607:f3e0:0:3::18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "pyroxene.sentex.ca", Issuer "Let's Encrypt Authority X3" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 769516F1E6 for ; Tue, 25 Jun 2019 18:19:06 +0000 (UTC) (envelope-from mike@sentex.net) Received: from [192.168.43.29] ([192.168.43.29]) by pyroxene.sentex.ca (8.15.2/8.15.2) with ESMTPS id x5PIJ4Bb062606 (version=TLSv1.2 cipher=AES128-SHA bits=128 verify=NO) for ; Tue, 25 Jun 2019 14:19:04 -0400 (EDT) (envelope-from mike@sentex.net) To: freebsd-questions@freebsd.org From: mike tancsa Subject: ZFS Optimizing for large directories and MANY files Message-ID: Date: Tue, 25 Jun 2019 14:19:05 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US X-Rspamd-Queue-Id: 769516F1E6 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:3::18 as permitted sender) smtp.mailfrom=mike@sentex.net X-Spamd-Result: default: False [-1.45 / 15.00]; ARC_NA(0.00)[]; RDNS_NONE(1.00)[]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-0.998,0]; DMARC_NA(0.00)[sentex.net]; MX_GOOD(-0.01)[cached: smtp.sentex.ca]; NEURAL_HAM_SHORT(-0.92)[-0.922,0]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; IP_SCORE(-1.72)[ipnet: 2607:f3e0::/32(-4.95), asn: 11647(-3.59), country: CA(-0.09)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; HFILTER_HOSTNAME_UNKNOWN(2.50)[]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jun 2019 18:19:07 -0000 I have been trying to understand various zfs sysctl settings once again and how they might relate to optimizing a file server that has very few big files, but MANY small ones (RELENG_12).... Sometimes directories that get upwards of 30,000+ files and the odd time when some outside user process breaks, 100,000+ files.  Obviously, throwing a LOT of RAM at the problem helps. But are there any more tunings I can do ? So far, I have adjusted vfs.zfs.arc_meta_strategy=1 /|vfs.zfs.arc_meta_limit to 65% of ARC memory over the default 25% on the zfs set in question, I have set primarycache=metadata |/ /|Anything else I can do to bias towards a file system with MANY files ?  Unfortunately, I cant control the end users from dumping many files in their single directories easily. I think the hit happens, when they log in, do a dir, see what files they need to download, download and log out. As long as that is cached, its not so bad.  |/ /| |/ /|Doing some simple tests on an imported version of the data set (on slower spinning rust drives), something simple such as |/ /|# time find . -type f -mtime -2d|/ /|takes 40 min after a cold boot.|/ /|Watching zfs disk IO, its super slow in terms of bandwidth, but gstat shows the disks close to being pegged.  I guess the heads are thrashing about inefficiently ? |/ /|1{ryzenbsd12}# zpool iostat tmpdisk 1                capacity     operations    bandwidth pool        alloc   free   read  write   read  write ----------  -----  -----  -----  -----  -----  ----- tmpdisk      301G  1.46T    335      1   899K  33.0K tmpdisk      301G  1.46T    402      0  1.02M      0 tmpdisk      301G  1.46T    265      0   559K      0 tmpdisk      301G  1.46T    331      0   715K      0 tmpdisk      301G  1.46T    276      0   650K      0 tmpdisk      301G  1.46T    293      0   718K      0 tmpdisk      301G  1.46T    432      0  1.11M      0 tmpdisk      301G  1.46T    435      0  1.03M      0 tmpdisk      301G  1.46T    412      0  1.01M      0 tmpdisk      301G  1.46T    315      0   717K      0 tmpdisk      301G  1.46T    417      0  1.04M      0 tmpdisk      301G  1.46T    457      0  1.13M      0 tmpdisk      301G  1.46T    448      0  1.05M      0|/ /|top shows ARC steadily growing |/ /|ARC: 5119M Total, 2128M MFU, 2361M MRU, 1608K Anon, 73M Header, 560M Other      606M Compressed, 3902M Uncompressed, 6.43:1 Ratio |/ /| |/ /|stats show|/ /| ARC Summary: (HEALTHY)         Memory Throttle Count:                  0 ARC Misc:         Deleted:                                238         Recycle Misses:                         0         Mutex Misses:                           0         Evict Skips:                            1.04k ARC Size:                               17.28%  5.20    GiB         Target Size: (Adaptive)         100.00% 30.07   GiB         Min Size (Hard Limit):          12.50%  3.76    GiB         Max Size (High Water):          8:1     30.07   GiB ARC Size Breakdown:         Recently Used Cache Size:       50.00%  15.03   GiB         Frequently Used Cache Size:     50.00%  15.03   GiB ARC Hash Breakdown:         Elements Max:                           247.31k         Elements Current:               100.00% 247.31k         Collisions:                             7.20k         Chain Max:                              3         Chains:                                 6.99k ------------------------------------------------------------------------ ARC Efficiency:                                 2.53m         Cache Hit Ratio:                88.31%  2.23m         Cache Miss Ratio:               11.69%  295.48k         Actual Hit Ratio:               88.24%  2.23m         Data Demand Efficiency:         87.76%  20.01k         CACHE HITS BY CACHE LIST:           Anonymously Used:             0.08%   1.69k           Most Recently Used:           21.64%  483.05k           Most Frequently Used:         78.28%  1.75m           Most Recently Used Ghost:     0.00%   0           Most Frequently Used Ghost:   0.00%   0         CACHE HITS BY DATA TYPE:           Demand Data:                  0.79%   17.56k           Prefetch Data:                0.00%   0           Demand Metadata:              99.14%  2.21m           Prefetch Metadata:            0.08%   1.69k         CACHE MISSES BY DATA TYPE:           Demand Data:                  0.83%   2.45k           Prefetch Data:                0.00%   0           Demand Metadata:              18.79%  55.52k           Prefetch Metadata:            80.38%  237.51k|/ /| |/ /| |/ /|Once a single trip through the file system via find is done, top shows|/ /|ARC: 10G Total, 7161M MFU, 467M MRU, 1600K Anon, 191M Header, 2842M Other      1647M Compressed, 11G Uncompressed, 7.12:1 Ratio |/ /|find, on the second iteration only takes|/ /|0{ryzenbsd12}# time find . -type f -mtime -2d ./list.txt ./l 1.992u 69.557s 1:11.54 100.0%   35+177k 169144+0io 0pf+0w 0{ryzenbsd12}# |/ /|and the stats look appropriately better too|/ /| ARC Summary: (HEALTHY)         Memory Throttle Count:                  0 ARC Misc:         Deleted:                                238         Recycle Misses:                         0         Mutex Misses:                           0         Evict Skips:                            1.04k ARC Size:                               34.11%  10.26   GiB         Target Size: (Adaptive)         100.00% 30.07   GiB         Min Size (Hard Limit):          12.50%  3.76    GiB         Max Size (High Water):          8:1     30.07   GiB ARC Size Breakdown:         Recently Used Cache Size:       50.00%  15.03   GiB         Frequently Used Cache Size:     50.00%  15.03   GiB ARC Hash Breakdown:         Elements Max:                           688.43k         Elements Current:               100.00% 688.43k         Collisions:                             53.65k         Chain Max:                              4         Chains:                                 50.50k ------------------------------------------------------------------------ ARC Efficiency:                                 56.03m         Cache Hit Ratio:                98.07%  54.94m         Cache Miss Ratio:               1.93%   1.08m         Actual Hit Ratio:               97.64%  54.71m         Data Demand Efficiency:         86.21%  21.97k         CACHE HITS BY CACHE LIST:           Anonymously Used:             0.43%   237.54k           Most Recently Used:           12.19%  6.70m           Most Frequently Used:         87.37%  48.01m           Most Recently Used Ghost:     0.00%   0           Most Frequently Used Ghost:   0.00%   0         CACHE HITS BY DATA TYPE:           Demand Data:                  0.03%   18.94k           Prefetch Data:                0.00%   0           Demand Metadata:              95.72%  52.59m           Prefetch Metadata:            4.24%   2.33m         CACHE MISSES BY DATA TYPE:           Demand Data:                  0.28%   3.03k           Prefetch Data:                0.00%   0           Demand Metadata:              50.84%  550.75k           Prefetch Metadata:            48.88%  529.54k ------------------------------------------------------------------------ |/ /|Anything else to adjust ? I was going to use RAID1+0 for the dataset on SSDs.  Should I bother with an NVME drive for L2ARC caching ?  On my test box, I can sort of approximate how much RAM I need for metadata (11G it seems), is there a better programatic way to find that value out ? |/ /|    ---Mike |/ /| |/