Date: Tue, 31 Mar 2015 01:07:24 +0100 From: Steven Hartland <killing@multiplay.co.uk> To: freebsd-fs@freebsd.org Subject: Re: All available memory used when deleting files from ZFS Message-ID: <5519E53C.4060203@multiplay.co.uk> In-Reply-To: <FD30147A-C7F7-4138-9F96-10024A6FE061@ebureau.com> References: <FD30147A-C7F7-4138-9F96-10024A6FE061@ebureau.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Later versions have vfs.zfs.free_max_blocks which is likely to be the fix your looking for. It was added to head by r271532 and stable/10 by: https://svnweb.freebsd.org/base?view=revision&revision=272665 Description being: Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to limit how many blocks can be free'ed before a new transaction group is created. The default is no limit (infinite), but we should probably have a lower default, e.g. 100,000. With this limit, we can guard against the case where ZFS could run out of memory when destroying large numbers of blocks in a single transaction group, as the entire DDT needs to be brought into memory. Illumos issue: 5138 add tunable for maximum number of blocks freed in one txg On 30/03/2015 22:14, Dustin Wenz wrote: > I had several systems panic or hang over the weekend while deleting some data off of their local zfs filesystem. It looks like they ran out of physical memory (32GB), and hung when paging to swap-on-zfs (which is not surprising, given that ZFS was likely using the memory). They were running 10.1-STABLE r277139M, which I built in the middle of January. The pools were about 35TB in size, and are a concatenation of 3TB mirrors. They were maybe 95% full. I deleted just over 1000 files, totaling 25TB on each system. > > It took roughly 10 minutes to remove that 25TB of data per host using a remote rsync, and immediately after that everything seemed fine. However, after several more minutes, every machine that had data removed became unresponsive. Some had numerous "swap_pager: indefinite wait buffer" errors followed by a panic, and some just died with no console messages. The same thing would happen after a reboot, when FreeBSD attempted to mount the local filesystem again. > > I was able to boot these systems after exporting the affected pool, but the problem would recur several minutes after initiating a "zpool import". Watching zfs statistics didn't seem to reveal where the memory was going; ARC would only climb to about 4GB, but free memory would decline rapidly. Eventually, after enough export/reboot/import cycles, the pool would import successfully and everything would be fine from then on. Note that there is no L2ARC or compression being used. > > Has anyone else run into this when deleting files on ZFS? It seems to be a consistent problem under the versions of 10.1 I'm running. > > For reference, I've appended a zstat dump below that was taken 5 minutes after starting a zpool import, and was about three minutes before the machine became unresponsive. You can see that the ARC is only 4GB, but free memory was down to 471MB (and continued to drop). > > - .Dustin > > > ------------------------------------------------------------------------ > ZFS Subsystem Report Mon Mar 30 12:35:27 2015 > ------------------------------------------------------------------------ > > System Information: > > Kernel Version: 1001506 (osreldate) > Hardware Platform: amd64 > Processor Architecture: amd64 > > ZFS Storage pool Version: 5000 > ZFS Filesystem Version: 5 > > FreeBSD 10.1-STABLE #11 r277139M: Tue Jan 13 14:59:55 CST 2015 root > 12:35PM up 8 mins, 3 users, load averages: 7.23, 8.96, 4.87 > > ------------------------------------------------------------------------ > > System Memory: > > 0.17% 55.40 MiB Active, 0.14% 46.11 MiB Inact > 98.34% 30.56 GiB Wired, 0.00% 0 Cache > 1.34% 425.46 MiB Free, 0.00% 4.00 KiB Gap > > Real Installed: 32.00 GiB > Real Available: 99.82% 31.94 GiB > Real Managed: 97.29% 31.08 GiB > > Logical Total: 32.00 GiB > Logical Used: 98.56% 31.54 GiB > Logical Free: 1.44% 471.57 MiB > > Kernel Memory: 3.17 GiB > Data: 99.18% 3.14 GiB > Text: 0.82% 26.68 MiB > > Kernel Memory Map: 31.08 GiB > Size: 14.18% 4.41 GiB > Free: 85.82% 26.67 GiB > > ------------------------------------------------------------------------ > > ARC Summary: (HEALTHY) > Memory Throttle Count: 0 > > ARC Misc: > Deleted: 145 > Recycle Misses: 0 > Mutex Misses: 0 > Evict Skips: 0 > > ARC Size: 14.17% 4.26 GiB > Target Size: (Adaptive) 100.00% 30.08 GiB > Min Size (Hard Limit): 12.50% 3.76 GiB > Max Size (High Water): 8:1 30.08 GiB > > ARC Size Breakdown: > Recently Used Cache Size: 50.00% 15.04 GiB > Frequently Used Cache Size: 50.00% 15.04 GiB > > ARC Hash Breakdown: > Elements Max: 270.56k > Elements Current: 100.00% 270.56k > Collisions: 23.66k > Chain Max: 3 > Chains: 8.28k > > ------------------------------------------------------------------------ > > ARC Efficiency: 2.93m > Cache Hit Ratio: 70.44% 2.06m > Cache Miss Ratio: 29.56% 866.05k > Actual Hit Ratio: 70.40% 2.06m > > Data Demand Efficiency: 97.47% 24.58k > Data Prefetch Efficiency: 1.88% 479 > > CACHE HITS BY CACHE LIST: > Anonymously Used: 0.05% 1.07k > Most Recently Used: 71.82% 1.48m > Most Frequently Used: 28.13% 580.49k > Most Recently Used Ghost: 0.00% 0 > Most Frequently Used Ghost: 0.00% 0 > > CACHE HITS BY DATA TYPE: > Demand Data: 1.16% 23.96k > Prefetch Data: 0.00% 9 > Demand Metadata: 98.79% 2.04m > Prefetch Metadata: 0.05% 1.08k > > CACHE MISSES BY DATA TYPE: > Demand Data: 0.07% 621 > Prefetch Data: 0.05% 470 > Demand Metadata: 99.69% 863.35k > Prefetch Metadata: 0.19% 1.61k > > ------------------------------------------------------------------------ > > L2ARC is disabled > > ------------------------------------------------------------------------ > > File-Level Prefetch: (HEALTHY) > > DMU Efficiency: 72.95k > Hit Ratio: 70.83% 51.66k > Miss Ratio: 29.17% 21.28k > > Colinear: 21.28k > Hit Ratio: 0.01% 2 > Miss Ratio: 99.99% 21.28k > > Stride: 50.45k > Hit Ratio: 99.98% 50.44k > Miss Ratio: 0.02% 9 > > DMU Misc: > Reclaim: 21.28k > Successes: 1.73% 368 > Failures: 98.27% 20.91k > > Streams: 1.23k > +Resets: 0.16% 2 > -Resets: 99.84% 1.23k > Bogus: 0 > > ------------------------------------------------------------------------ > > VDEV cache is disabled > > ------------------------------------------------------------------------ > > ZFS Tunables (sysctl): > kern.maxusers 2380 > vm.kmem_size 33367830528 > vm.kmem_size_scale 1 > vm.kmem_size_min 0 > vm.kmem_size_max 1319413950874 > vfs.zfs.arc_max 32294088704 > vfs.zfs.arc_min 4036761088 > vfs.zfs.arc_average_blocksize 8192 > vfs.zfs.arc_shrink_shift 5 > vfs.zfs.arc_free_target 56518 > vfs.zfs.arc_meta_used 4534349216 > vfs.zfs.arc_meta_limit 8073522176 > vfs.zfs.l2arc_write_max 8388608 > vfs.zfs.l2arc_write_boost 8388608 > vfs.zfs.l2arc_headroom 2 > vfs.zfs.l2arc_feed_secs 1 > vfs.zfs.l2arc_feed_min_ms 200 > vfs.zfs.l2arc_noprefetch 1 > vfs.zfs.l2arc_feed_again 1 > vfs.zfs.l2arc_norw 1 > vfs.zfs.anon_size 1786368 > vfs.zfs.anon_metadata_lsize 0 > vfs.zfs.anon_data_lsize 0 > vfs.zfs.mru_size 504812032 > vfs.zfs.mru_metadata_lsize 415273472 > vfs.zfs.mru_data_lsize 35227648 > vfs.zfs.mru_ghost_size 0 > vfs.zfs.mru_ghost_metadata_lsize 0 > vfs.zfs.mru_ghost_data_lsize 0 > vfs.zfs.mfu_size 3925990912 > vfs.zfs.mfu_metadata_lsize 3901947392 > vfs.zfs.mfu_data_lsize 7000064 > vfs.zfs.mfu_ghost_size 0 > vfs.zfs.mfu_ghost_metadata_lsize 0 > vfs.zfs.mfu_ghost_data_lsize 0 > vfs.zfs.l2c_only_size 0 > vfs.zfs.dedup.prefetch 1 > vfs.zfs.nopwrite_enabled 1 > vfs.zfs.mdcomp_disable 0 > vfs.zfs.max_recordsize 1048576 > vfs.zfs.dirty_data_max 3429735628 > vfs.zfs.dirty_data_max_max 4294967296 > vfs.zfs.dirty_data_max_percent 10 > vfs.zfs.dirty_data_sync 67108864 > vfs.zfs.delay_min_dirty_percent 60 > vfs.zfs.delay_scale 500000 > vfs.zfs.prefetch_disable 0 > vfs.zfs.zfetch.max_streams 8 > vfs.zfs.zfetch.min_sec_reap 2 > vfs.zfs.zfetch.block_cap 256 > vfs.zfs.zfetch.array_rd_sz 1048576 > vfs.zfs.top_maxinflight 32 > vfs.zfs.resilver_delay 2 > vfs.zfs.scrub_delay 4 > vfs.zfs.scan_idle 50 > vfs.zfs.scan_min_time_ms 1000 > vfs.zfs.free_min_time_ms 1000 > vfs.zfs.resilver_min_time_ms 3000 > vfs.zfs.no_scrub_io 0 > vfs.zfs.no_scrub_prefetch 0 > vfs.zfs.free_max_blocks -1 > vfs.zfs.metaslab.gang_bang 16777217 > vfs.zfs.metaslab.fragmentation_threshold70 > vfs.zfs.metaslab.debug_load 0 > vfs.zfs.metaslab.debug_unload 0 > vfs.zfs.metaslab.df_alloc_threshold 131072 > vfs.zfs.metaslab.df_free_pct 4 > vfs.zfs.metaslab.min_alloc_size 33554432 > vfs.zfs.metaslab.load_pct 50 > vfs.zfs.metaslab.unload_delay 8 > vfs.zfs.metaslab.preload_limit 3 > vfs.zfs.metaslab.preload_enabled 1 > vfs.zfs.metaslab.fragmentation_factor_enabled1 > vfs.zfs.metaslab.lba_weighting_enabled 1 > vfs.zfs.metaslab.bias_enabled 1 > vfs.zfs.condense_pct 200 > vfs.zfs.mg_noalloc_threshold 0 > vfs.zfs.mg_fragmentation_threshold 85 > vfs.zfs.check_hostid 1 > vfs.zfs.spa_load_verify_maxinflight 10000 > vfs.zfs.spa_load_verify_metadata 1 > vfs.zfs.spa_load_verify_data 1 > vfs.zfs.recover 0 > vfs.zfs.deadman_synctime_ms 1000000 > vfs.zfs.deadman_checktime_ms 5000 > vfs.zfs.deadman_enabled 1 > vfs.zfs.spa_asize_inflation 24 > vfs.zfs.spa_slop_shift 5 > vfs.zfs.space_map_blksz 4096 > vfs.zfs.txg.timeout 5 > vfs.zfs.vdev.metaslabs_per_vdev 200 > vfs.zfs.vdev.cache.max 16384 > vfs.zfs.vdev.cache.size 0 > vfs.zfs.vdev.cache.bshift 16 > vfs.zfs.vdev.trim_on_init 1 > vfs.zfs.vdev.mirror.rotating_inc 0 > vfs.zfs.vdev.mirror.rotating_seek_inc 5 > vfs.zfs.vdev.mirror.rotating_seek_offset1048576 > vfs.zfs.vdev.mirror.non_rotating_inc 0 > vfs.zfs.vdev.mirror.non_rotating_seek_inc1 > vfs.zfs.vdev.async_write_active_min_dirty_percent30 > vfs.zfs.vdev.async_write_active_max_dirty_percent60 > vfs.zfs.vdev.max_active 1000 > vfs.zfs.vdev.sync_read_min_active 10 > vfs.zfs.vdev.sync_read_max_active 10 > vfs.zfs.vdev.sync_write_min_active 10 > vfs.zfs.vdev.sync_write_max_active 10 > vfs.zfs.vdev.async_read_min_active 1 > vfs.zfs.vdev.async_read_max_active 3 > vfs.zfs.vdev.async_write_min_active 1 > vfs.zfs.vdev.async_write_max_active 10 > vfs.zfs.vdev.scrub_min_active 1 > vfs.zfs.vdev.scrub_max_active 2 > vfs.zfs.vdev.trim_min_active 1 > vfs.zfs.vdev.trim_max_active 64 > vfs.zfs.vdev.aggregation_limit 131072 > vfs.zfs.vdev.read_gap_limit 32768 > vfs.zfs.vdev.write_gap_limit 4096 > vfs.zfs.vdev.bio_flush_disable 0 > vfs.zfs.vdev.bio_delete_disable 0 > vfs.zfs.vdev.trim_max_bytes 2147483648 > vfs.zfs.vdev.trim_max_pending 64 > vfs.zfs.max_auto_ashift 13 > vfs.zfs.min_auto_ashift 9 > vfs.zfs.zil_replay_disable 0 > vfs.zfs.cache_flush_disable 0 > vfs.zfs.zio.use_uma 1 > vfs.zfs.zio.exclude_metadata 0 > vfs.zfs.sync_pass_deferred_free 2 > vfs.zfs.sync_pass_dont_compress 5 > vfs.zfs.sync_pass_rewrite 2 > vfs.zfs.snapshot_list_prefetch 0 > vfs.zfs.super_owner 0 > vfs.zfs.debug 0 > vfs.zfs.version.ioctl 4 > vfs.zfs.version.acl 1 > vfs.zfs.version.spa 5000 > vfs.zfs.version.zpl 5 > vfs.zfs.vol.mode 1 > vfs.zfs.vol.unmap_enabled 1 > vfs.zfs.trim.enabled 1 > vfs.zfs.trim.txg_delay 32 > vfs.zfs.trim.timeout 30 > vfs.zfs.trim.max_interval 1 > > ------------------------------------------------------------------------ > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5519E53C.4060203>