From owner-freebsd-fs@FreeBSD.ORG Tue Mar 31 03:52:42 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2D53E9DB for ; Tue, 31 Mar 2015 03:52:42 +0000 (UTC) Received: from internet06.ebureau.com (internet06.ebureau.com [65.127.24.25]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "internet06.ebureau.com", Issuer "internet06.ebureau.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id EDECDB91 for ; Tue, 31 Mar 2015 03:52:41 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by internet06.ebureau.com (Postfix) with ESMTP id 6E61F37F7C87; Mon, 30 Mar 2015 22:52:39 -0500 (CDT) X-Virus-Scanned: amavisd-new at ebureau.com Received: from internet06.ebureau.com ([127.0.0.1]) by localhost (internet06.ebureau.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k3PhhBmg_Nnl; Mon, 30 Mar 2015 22:52:37 -0500 (CDT) Received: from [10.238.39.191] (unknown [166.175.191.161]) by internet06.ebureau.com (Postfix) with ESMTPSA id 4288B37F7C7A; Mon, 30 Mar 2015 22:52:37 -0500 (CDT) From: Dustin Wenz Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Subject: Re: All available memory used when deleting files from ZFS Message-Id: <3FF6F6A0-A23F-4EDE-98F6-5B8E41EC34A1@ebureau.com> Date: Mon, 30 Mar 2015 22:52:31 -0500 References: <5519E53C.4060203@multiplay.co.uk> In-Reply-To: <5519E53C.4060203@multiplay.co.uk> To: Steven Hartland , "" X-Mailer: iPhone Mail (12B411) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Mar 2015 03:52:42 -0000 Thanks, Steven! However, I have not enabled dedup on any of the affected fil= esystems. Unless it became a default at some point, I'm not sure how that tu= nable would help.=20 - .Dustin > On Mar 30, 2015, at 7:07 PM, Steven Hartland wro= te: >=20 > Later versions have vfs.zfs.free_max_blocks which is likely to be the fix y= our looking for. >=20 > It was added to head by r271532 and stable/10 by: > https://svnweb.freebsd.org/base?view=3Drevision&revision=3D272665 >=20 > Description being: >=20 > Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to > limit how many blocks can be free'ed before a new transaction group is > created. The default is no limit (infinite), but we should probably have > a lower default, e.g. 100,000. >=20 > With this limit, we can guard against the case where ZFS could run out of > memory when destroying large numbers of blocks in a single transaction > group, as the entire DDT needs to be brought into memory. >=20 > Illumos issue: > 5138 add tunable for maximum number of blocks freed in one txg >=20 >=20 >=20 >> On 30/03/2015 22:14, Dustin Wenz wrote: >> I had several systems panic or hang over the weekend while deleting some d= ata off of their local zfs filesystem. It looks like they ran out of physica= l memory (32GB), and hung when paging to swap-on-zfs (which is not surprisin= g, given that ZFS was likely using the memory). They were running 10.1-STABL= E r277139M, which I built in the middle of January. The pools were about 35T= B in size, and are a concatenation of 3TB mirrors. They were maybe 95% full.= I deleted just over 1000 files, totaling 25TB on each system. >>=20 >> It took roughly 10 minutes to remove that 25TB of data per host using a r= emote rsync, and immediately after that everything seemed fine. However, aft= er several more minutes, every machine that had data removed became unrespon= sive. Some had numerous "swap_pager: indefinite wait buffer" errors followed= by a panic, and some just died with no console messages. The same thing wou= ld happen after a reboot, when FreeBSD attempted to mount the local filesyst= em again. >>=20 >> I was able to boot these systems after exporting the affected pool, but t= he problem would recur several minutes after initiating a "zpool import". Wa= tching zfs statistics didn't seem to reveal where the memory was going; ARC w= ould only climb to about 4GB, but free memory would decline rapidly. Eventua= lly, after enough export/reboot/import cycles, the pool would import success= fully and everything would be fine from then on. Note that there is no L2ARC= or compression being used. >>=20 >> Has anyone else run into this when deleting files on ZFS? It seems to be a= consistent problem under the versions of 10.1 I'm running. >>=20 >> For reference, I've appended a zstat dump below that was taken 5 minutes a= fter starting a zpool import, and was about three minutes before the machine= became unresponsive. You can see that the ARC is only 4GB, but free memory w= as down to 471MB (and continued to drop). >>=20 >> - .Dustin >>=20 >>=20 >> ------------------------------------------------------------------------ >> ZFS Subsystem Report Mon Mar 30 12:35:27 2015 >> ------------------------------------------------------------------------ >>=20 >> System Information: >>=20 >> Kernel Version: 1001506 (osreldate) >> Hardware Platform: amd64 >> Processor Architecture: amd64 >>=20 >> ZFS Storage pool Version: 5000 >> ZFS Filesystem Version: 5 >>=20 >> FreeBSD 10.1-STABLE #11 r277139M: Tue Jan 13 14:59:55 CST 2015 root >> 12:35PM up 8 mins, 3 users, load averages: 7.23, 8.96, 4.87 >>=20 >> ------------------------------------------------------------------------ >>=20 >> System Memory: >>=20 >> 0.17% 55.40 MiB Active, 0.14% 46.11 MiB Inact >> 98.34% 30.56 GiB Wired, 0.00% 0 Cache >> 1.34% 425.46 MiB Free, 0.00% 4.00 KiB Gap >>=20 >> Real Installed: 32.00 GiB >> Real Available: 99.82% 31.94 GiB >> Real Managed: 97.29% 31.08 GiB >>=20 >> Logical Total: 32.00 GiB >> Logical Used: 98.56% 31.54 GiB >> Logical Free: 1.44% 471.57 MiB >>=20 >> Kernel Memory: 3.17 GiB >> Data: 99.18% 3.14 GiB >> Text: 0.82% 26.68 MiB >>=20 >> Kernel Memory Map: 31.08 GiB >> Size: 14.18% 4.41 GiB >> Free: 85.82% 26.67 GiB >>=20 >> ------------------------------------------------------------------------ >>=20 >> ARC Summary: (HEALTHY) >> Memory Throttle Count: 0 >>=20 >> ARC Misc: >> Deleted: 145 >> Recycle Misses: 0 >> Mutex Misses: 0 >> Evict Skips: 0 >>=20 >> ARC Size: 14.17% 4.26 GiB >> Target Size: (Adaptive) 100.00% 30.08 GiB >> Min Size (Hard Limit): 12.50% 3.76 GiB >> Max Size (High Water): 8:1 30.08 GiB >>=20 >> ARC Size Breakdown: >> Recently Used Cache Size: 50.00% 15.04 GiB >> Frequently Used Cache Size: 50.00% 15.04 GiB >>=20 >> ARC Hash Breakdown: >> Elements Max: 270.56k >> Elements Current: 100.00% 270.56k >> Collisions: 23.66k >> Chain Max: 3 >> Chains: 8.28k >>=20 >> ------------------------------------------------------------------------ >>=20 >> ARC Efficiency: 2.93m >> Cache Hit Ratio: 70.44% 2.06m >> Cache Miss Ratio: 29.56% 866.05k >> Actual Hit Ratio: 70.40% 2.06m >>=20 >> Data Demand Efficiency: 97.47% 24.58k >> Data Prefetch Efficiency: 1.88% 479 >>=20 >> CACHE HITS BY CACHE LIST: >> Anonymously Used: 0.05% 1.07k >> Most Recently Used: 71.82% 1.48m >> Most Frequently Used: 28.13% 580.49k >> Most Recently Used Ghost: 0.00% 0 >> Most Frequently Used Ghost: 0.00% 0 >>=20 >> CACHE HITS BY DATA TYPE: >> Demand Data: 1.16% 23.96k >> Prefetch Data: 0.00% 9 >> Demand Metadata: 98.79% 2.04m >> Prefetch Metadata: 0.05% 1.08k >>=20 >> CACHE MISSES BY DATA TYPE: >> Demand Data: 0.07% 621 >> Prefetch Data: 0.05% 470 >> Demand Metadata: 99.69% 863.35k >> Prefetch Metadata: 0.19% 1.61k >>=20 >> ------------------------------------------------------------------------ >>=20 >> L2ARC is disabled >>=20 >> ------------------------------------------------------------------------ >>=20 >> File-Level Prefetch: (HEALTHY) >>=20 >> DMU Efficiency: 72.95k >> Hit Ratio: 70.83% 51.66k >> Miss Ratio: 29.17% 21.28k >>=20 >> Colinear: 21.28k >> Hit Ratio: 0.01% 2 >> Miss Ratio: 99.99% 21.28k >>=20 >> Stride: 50.45k >> Hit Ratio: 99.98% 50.44k >> Miss Ratio: 0.02% 9 >>=20 >> DMU Misc: >> Reclaim: 21.28k >> Successes: 1.73% 368 >> Failures: 98.27% 20.91k >>=20 >> Streams: 1.23k >> +Resets: 0.16% 2 >> -Resets: 99.84% 1.23k >> Bogus: 0 >>=20 >> ------------------------------------------------------------------------ >>=20 >> VDEV cache is disabled >>=20 >> ------------------------------------------------------------------------ >>=20 >> ZFS Tunables (sysctl): >> kern.maxusers 2380 >> vm.kmem_size 33367830528 >> vm.kmem_size_scale 1 >> vm.kmem_size_min 0 >> vm.kmem_size_max 1319413950874 >> vfs.zfs.arc_max 32294088704 >> vfs.zfs.arc_min 4036761088 >> vfs.zfs.arc_average_blocksize 8192 >> vfs.zfs.arc_shrink_shift 5 >> vfs.zfs.arc_free_target 56518 >> vfs.zfs.arc_meta_used 4534349216 >> vfs.zfs.arc_meta_limit 8073522176 >> vfs.zfs.l2arc_write_max 8388608 >> vfs.zfs.l2arc_write_boost 8388608 >> vfs.zfs.l2arc_headroom 2 >> vfs.zfs.l2arc_feed_secs 1 >> vfs.zfs.l2arc_feed_min_ms 200 >> vfs.zfs.l2arc_noprefetch 1 >> vfs.zfs.l2arc_feed_again 1 >> vfs.zfs.l2arc_norw 1 >> vfs.zfs.anon_size 1786368 >> vfs.zfs.anon_metadata_lsize 0 >> vfs.zfs.anon_data_lsize 0 >> vfs.zfs.mru_size 504812032 >> vfs.zfs.mru_metadata_lsize 415273472 >> vfs.zfs.mru_data_lsize 35227648 >> vfs.zfs.mru_ghost_size 0 >> vfs.zfs.mru_ghost_metadata_lsize 0 >> vfs.zfs.mru_ghost_data_lsize 0 >> vfs.zfs.mfu_size 3925990912 >> vfs.zfs.mfu_metadata_lsize 3901947392 >> vfs.zfs.mfu_data_lsize 7000064 >> vfs.zfs.mfu_ghost_size 0 >> vfs.zfs.mfu_ghost_metadata_lsize 0 >> vfs.zfs.mfu_ghost_data_lsize 0 >> vfs.zfs.l2c_only_size 0 >> vfs.zfs.dedup.prefetch 1 >> vfs.zfs.nopwrite_enabled 1 >> vfs.zfs.mdcomp_disable 0 >> vfs.zfs.max_recordsize 1048576 >> vfs.zfs.dirty_data_max 3429735628 >> vfs.zfs.dirty_data_max_max 4294967296 >> vfs.zfs.dirty_data_max_percent 10 >> vfs.zfs.dirty_data_sync 67108864 >> vfs.zfs.delay_min_dirty_percent 60 >> vfs.zfs.delay_scale 500000 >> vfs.zfs.prefetch_disable 0 >> vfs.zfs.zfetch.max_streams 8 >> vfs.zfs.zfetch.min_sec_reap 2 >> vfs.zfs.zfetch.block_cap 256 >> vfs.zfs.zfetch.array_rd_sz 1048576 >> vfs.zfs.top_maxinflight 32 >> vfs.zfs.resilver_delay 2 >> vfs.zfs.scrub_delay 4 >> vfs.zfs.scan_idle 50 >> vfs.zfs.scan_min_time_ms 1000 >> vfs.zfs.free_min_time_ms 1000 >> vfs.zfs.resilver_min_time_ms 3000 >> vfs.zfs.no_scrub_io 0 >> vfs.zfs.no_scrub_prefetch 0 >> vfs.zfs.free_max_blocks -1 >> vfs.zfs.metaslab.gang_bang 16777217 >> vfs.zfs.metaslab.fragmentation_threshold70 >> vfs.zfs.metaslab.debug_load 0 >> vfs.zfs.metaslab.debug_unload 0 >> vfs.zfs.metaslab.df_alloc_threshold 131072 >> vfs.zfs.metaslab.df_free_pct 4 >> vfs.zfs.metaslab.min_alloc_size 33554432 >> vfs.zfs.metaslab.load_pct 50 >> vfs.zfs.metaslab.unload_delay 8 >> vfs.zfs.metaslab.preload_limit 3 >> vfs.zfs.metaslab.preload_enabled 1 >> vfs.zfs.metaslab.fragmentation_factor_enabled1 >> vfs.zfs.metaslab.lba_weighting_enabled 1 >> vfs.zfs.metaslab.bias_enabled 1 >> vfs.zfs.condense_pct 200 >> vfs.zfs.mg_noalloc_threshold 0 >> vfs.zfs.mg_fragmentation_threshold 85 >> vfs.zfs.check_hostid 1 >> vfs.zfs.spa_load_verify_maxinflight 10000 >> vfs.zfs.spa_load_verify_metadata 1 >> vfs.zfs.spa_load_verify_data 1 >> vfs.zfs.recover 0 >> vfs.zfs.deadman_synctime_ms 1000000 >> vfs.zfs.deadman_checktime_ms 5000 >> vfs.zfs.deadman_enabled 1 >> vfs.zfs.spa_asize_inflation 24 >> vfs.zfs.spa_slop_shift 5 >> vfs.zfs.space_map_blksz 4096 >> vfs.zfs.txg.timeout 5 >> vfs.zfs.vdev.metaslabs_per_vdev 200 >> vfs.zfs.vdev.cache.max 16384 >> vfs.zfs.vdev.cache.size 0 >> vfs.zfs.vdev.cache.bshift 16 >> vfs.zfs.vdev.trim_on_init 1 >> vfs.zfs.vdev.mirror.rotating_inc 0 >> vfs.zfs.vdev.mirror.rotating_seek_inc 5 >> vfs.zfs.vdev.mirror.rotating_seek_offset1048576 >> vfs.zfs.vdev.mirror.non_rotating_inc 0 >> vfs.zfs.vdev.mirror.non_rotating_seek_inc1 >> vfs.zfs.vdev.async_write_active_min_dirty_percent30 >> vfs.zfs.vdev.async_write_active_max_dirty_percent60 >> vfs.zfs.vdev.max_active 1000 >> vfs.zfs.vdev.sync_read_min_active 10 >> vfs.zfs.vdev.sync_read_max_active 10 >> vfs.zfs.vdev.sync_write_min_active 10 >> vfs.zfs.vdev.sync_write_max_active 10 >> vfs.zfs.vdev.async_read_min_active 1 >> vfs.zfs.vdev.async_read_max_active 3 >> vfs.zfs.vdev.async_write_min_active 1 >> vfs.zfs.vdev.async_write_max_active 10 >> vfs.zfs.vdev.scrub_min_active 1 >> vfs.zfs.vdev.scrub_max_active 2 >> vfs.zfs.vdev.trim_min_active 1 >> vfs.zfs.vdev.trim_max_active 64 >> vfs.zfs.vdev.aggregation_limit 131072 >> vfs.zfs.vdev.read_gap_limit 32768 >> vfs.zfs.vdev.write_gap_limit 4096 >> vfs.zfs.vdev.bio_flush_disable 0 >> vfs.zfs.vdev.bio_delete_disable 0 >> vfs.zfs.vdev.trim_max_bytes 2147483648 >> vfs.zfs.vdev.trim_max_pending 64 >> vfs.zfs.max_auto_ashift 13 >> vfs.zfs.min_auto_ashift 9 >> vfs.zfs.zil_replay_disable 0 >> vfs.zfs.cache_flush_disable 0 >> vfs.zfs.zio.use_uma 1 >> vfs.zfs.zio.exclude_metadata 0 >> vfs.zfs.sync_pass_deferred_free 2 >> vfs.zfs.sync_pass_dont_compress 5 >> vfs.zfs.sync_pass_rewrite 2 >> vfs.zfs.snapshot_list_prefetch 0 >> vfs.zfs.super_owner 0 >> vfs.zfs.debug 0 >> vfs.zfs.version.ioctl 4 >> vfs.zfs.version.acl 1 >> vfs.zfs.version.spa 5000 >> vfs.zfs.version.zpl 5 >> vfs.zfs.vol.mode 1 >> vfs.zfs.vol.unmap_enabled 1 >> vfs.zfs.trim.enabled 1 >> vfs.zfs.trim.txg_delay 32 >> vfs.zfs.trim.timeout 30 >> vfs.zfs.trim.max_interval 1 >>=20 >> ------------------------------------------------------------------------ >>=20 >>=20 >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"