Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Mar 2015 22:52:31 -0500
From:      Dustin Wenz <dustinwenz@ebureau.com>
To:        Steven Hartland <killing@multiplay.co.uk>, "<freebsd-fs@freebsd.org>" <freebsd-fs@freebsd.org>
Subject:   Re: All available memory used when deleting files from ZFS
Message-ID:  <3FF6F6A0-A23F-4EDE-98F6-5B8E41EC34A1@ebureau.com>
In-Reply-To: <5519E53C.4060203@multiplay.co.uk>
References:  <FD30147A-C7F7-4138-9F96-10024A6FE061@ebureau.com> <5519E53C.4060203@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks, Steven! However, I have not enabled dedup on any of the affected fil=
esystems. Unless it became a default at some point, I'm not sure how that tu=
nable would help.=20

    - .Dustin

> On Mar 30, 2015, at 7:07 PM, Steven Hartland <killing@multiplay.co.uk> wro=
te:
>=20
> Later versions have vfs.zfs.free_max_blocks which is likely to be the fix y=
our looking for.
>=20
> It was added to head by r271532 and stable/10 by:
> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D272665
>=20
> Description being:
>=20
> Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to
> limit how many blocks can be free'ed before a new transaction group is
> created.  The default is no limit (infinite), but we should probably have
> a lower default, e.g. 100,000.
>=20
> With this limit, we can guard against the case where ZFS could run out of
> memory when destroying large numbers of blocks in a single transaction
> group, as the entire DDT needs to be brought into memory.
>=20
> Illumos issue:
>    5138 add tunable for maximum number of blocks freed in one txg
>=20
>=20
>=20
>> On 30/03/2015 22:14, Dustin Wenz wrote:
>> I had several systems panic or hang over the weekend while deleting some d=
ata off of their local zfs filesystem. It looks like they ran out of physica=
l memory (32GB), and hung when paging to swap-on-zfs (which is not surprisin=
g, given that ZFS was likely using the memory). They were running 10.1-STABL=
E r277139M, which I built in the middle of January. The pools were about 35T=
B in size, and are a concatenation of 3TB mirrors. They were maybe 95% full.=
 I deleted just over 1000 files, totaling 25TB on each system.
>>=20
>> It took roughly 10 minutes to remove that 25TB of data per host using a r=
emote rsync, and immediately after that everything seemed fine. However, aft=
er several more minutes, every machine that had data removed became unrespon=
sive. Some had numerous "swap_pager: indefinite wait buffer" errors followed=
 by a panic, and some just died with no console messages. The same thing wou=
ld happen after a reboot, when FreeBSD attempted to mount the local filesyst=
em again.
>>=20
>> I was able to boot these systems after exporting the affected pool, but t=
he problem would recur several minutes after initiating a "zpool import". Wa=
tching zfs statistics didn't seem to reveal where the memory was going; ARC w=
ould only climb to about 4GB, but free memory would decline rapidly. Eventua=
lly, after enough export/reboot/import cycles, the pool would import success=
fully and everything would be fine from then on. Note that there is no L2ARC=
 or compression being used.
>>=20
>> Has anyone else run into this when deleting files on ZFS? It seems to be a=
 consistent problem under the versions of 10.1 I'm running.
>>=20
>> For reference, I've appended a zstat dump below that was taken 5 minutes a=
fter starting a zpool import, and was about three minutes before the machine=
 became unresponsive. You can see that the ARC is only 4GB, but free memory w=
as down to 471MB (and continued to drop).
>>=20
>>    - .Dustin
>>=20
>>=20
>> ------------------------------------------------------------------------
>> ZFS Subsystem Report                Mon Mar 30 12:35:27 2015
>> ------------------------------------------------------------------------
>>=20
>> System Information:
>>=20
>>    Kernel Version:                1001506 (osreldate)
>>    Hardware Platform:            amd64
>>    Processor Architecture:            amd64
>>=20
>>    ZFS Storage pool Version:        5000
>>    ZFS Filesystem Version:            5
>>=20
>> FreeBSD 10.1-STABLE #11 r277139M: Tue Jan 13 14:59:55 CST 2015 root
>> 12:35PM  up 8 mins, 3 users, load averages: 7.23, 8.96, 4.87
>>=20
>> ------------------------------------------------------------------------
>>=20
>> System Memory:
>>=20
>>    0.17%    55.40    MiB Active,    0.14%    46.11    MiB Inact
>>    98.34%    30.56    GiB Wired,    0.00%    0 Cache
>>    1.34%    425.46    MiB Free,    0.00%    4.00    KiB Gap
>>=20
>>    Real Installed:                32.00    GiB
>>    Real Available:            99.82%    31.94    GiB
>>    Real Managed:            97.29%    31.08    GiB
>>=20
>>    Logical Total:                32.00    GiB
>>    Logical Used:            98.56%    31.54    GiB
>>    Logical Free:            1.44%    471.57    MiB
>>=20
>> Kernel Memory:                    3.17    GiB
>>    Data:                99.18%    3.14    GiB
>>    Text:                0.82%    26.68    MiB
>>=20
>> Kernel Memory Map:                31.08    GiB
>>    Size:                14.18%    4.41    GiB
>>    Free:                85.82%    26.67    GiB
>>=20
>> ------------------------------------------------------------------------
>>=20
>> ARC Summary: (HEALTHY)
>>    Memory Throttle Count:            0
>>=20
>> ARC Misc:
>>    Deleted:                145
>>    Recycle Misses:                0
>>    Mutex Misses:                0
>>    Evict Skips:                0
>>=20
>> ARC Size:                14.17%    4.26    GiB
>>    Target Size: (Adaptive)        100.00%    30.08    GiB
>>    Min Size (Hard Limit):        12.50%    3.76    GiB
>>    Max Size (High Water):        8:1    30.08    GiB
>>=20
>> ARC Size Breakdown:
>>    Recently Used Cache Size:    50.00%    15.04    GiB
>>    Frequently Used Cache Size:    50.00%    15.04    GiB
>>=20
>> ARC Hash Breakdown:
>>    Elements Max:                270.56k
>>    Elements Current:        100.00%    270.56k
>>    Collisions:                23.66k
>>    Chain Max:                3
>>    Chains:                    8.28k
>>=20
>> ------------------------------------------------------------------------
>>=20
>> ARC Efficiency:                    2.93m
>>    Cache Hit Ratio:        70.44%    2.06m
>>    Cache Miss Ratio:        29.56%    866.05k
>>    Actual Hit Ratio:        70.40%    2.06m
>>=20
>>    Data Demand Efficiency:        97.47%    24.58k
>>    Data Prefetch Efficiency:    1.88%    479
>>=20
>>    CACHE HITS BY CACHE LIST:
>>      Anonymously Used:        0.05%    1.07k
>>      Most Recently Used:        71.82%    1.48m
>>      Most Frequently Used:        28.13%    580.49k
>>      Most Recently Used Ghost:    0.00%    0
>>      Most Frequently Used Ghost:    0.00%    0
>>=20
>>    CACHE HITS BY DATA TYPE:
>>      Demand Data:            1.16%    23.96k
>>      Prefetch Data:        0.00%    9
>>      Demand Metadata:        98.79%    2.04m
>>      Prefetch Metadata:        0.05%    1.08k
>>=20
>>    CACHE MISSES BY DATA TYPE:
>>      Demand Data:            0.07%    621
>>      Prefetch Data:        0.05%    470
>>      Demand Metadata:        99.69%    863.35k
>>      Prefetch Metadata:        0.19%    1.61k
>>=20
>> ------------------------------------------------------------------------
>>=20
>> L2ARC is disabled
>>=20
>> ------------------------------------------------------------------------
>>=20
>> File-Level Prefetch: (HEALTHY)
>>=20
>> DMU Efficiency:                    72.95k
>>    Hit Ratio:            70.83%    51.66k
>>    Miss Ratio:            29.17%    21.28k
>>=20
>>    Colinear:                21.28k
>>      Hit Ratio:            0.01%    2
>>      Miss Ratio:            99.99%    21.28k
>>=20
>>    Stride:                    50.45k
>>      Hit Ratio:            99.98%    50.44k
>>      Miss Ratio:            0.02%    9
>>=20
>> DMU Misc:
>>    Reclaim:                21.28k
>>      Successes:            1.73%    368
>>      Failures:            98.27%    20.91k
>>=20
>>    Streams:                1.23k
>>      +Resets:            0.16%    2
>>      -Resets:            99.84%    1.23k
>>      Bogus:                0
>>=20
>> ------------------------------------------------------------------------
>>=20
>> VDEV cache is disabled
>>=20
>> ------------------------------------------------------------------------
>>=20
>> ZFS Tunables (sysctl):
>>    kern.maxusers                           2380
>>    vm.kmem_size                            33367830528
>>    vm.kmem_size_scale                      1
>>    vm.kmem_size_min                        0
>>    vm.kmem_size_max                        1319413950874
>>    vfs.zfs.arc_max                         32294088704
>>    vfs.zfs.arc_min                         4036761088
>>    vfs.zfs.arc_average_blocksize           8192
>>    vfs.zfs.arc_shrink_shift                5
>>    vfs.zfs.arc_free_target                 56518
>>    vfs.zfs.arc_meta_used                   4534349216
>>    vfs.zfs.arc_meta_limit                  8073522176
>>    vfs.zfs.l2arc_write_max                 8388608
>>    vfs.zfs.l2arc_write_boost               8388608
>>    vfs.zfs.l2arc_headroom                  2
>>    vfs.zfs.l2arc_feed_secs                 1
>>    vfs.zfs.l2arc_feed_min_ms               200
>>    vfs.zfs.l2arc_noprefetch                1
>>    vfs.zfs.l2arc_feed_again                1
>>    vfs.zfs.l2arc_norw                      1
>>    vfs.zfs.anon_size                       1786368
>>    vfs.zfs.anon_metadata_lsize             0
>>    vfs.zfs.anon_data_lsize                 0
>>    vfs.zfs.mru_size                        504812032
>>    vfs.zfs.mru_metadata_lsize              415273472
>>    vfs.zfs.mru_data_lsize                  35227648
>>    vfs.zfs.mru_ghost_size                  0
>>    vfs.zfs.mru_ghost_metadata_lsize        0
>>    vfs.zfs.mru_ghost_data_lsize            0
>>    vfs.zfs.mfu_size                        3925990912
>>    vfs.zfs.mfu_metadata_lsize              3901947392
>>    vfs.zfs.mfu_data_lsize                  7000064
>>    vfs.zfs.mfu_ghost_size                  0
>>    vfs.zfs.mfu_ghost_metadata_lsize        0
>>    vfs.zfs.mfu_ghost_data_lsize            0
>>    vfs.zfs.l2c_only_size                   0
>>    vfs.zfs.dedup.prefetch                  1
>>    vfs.zfs.nopwrite_enabled                1
>>    vfs.zfs.mdcomp_disable                  0
>>    vfs.zfs.max_recordsize                  1048576
>>    vfs.zfs.dirty_data_max                  3429735628
>>    vfs.zfs.dirty_data_max_max              4294967296
>>    vfs.zfs.dirty_data_max_percent          10
>>    vfs.zfs.dirty_data_sync                 67108864
>>    vfs.zfs.delay_min_dirty_percent         60
>>    vfs.zfs.delay_scale                     500000
>>    vfs.zfs.prefetch_disable                0
>>    vfs.zfs.zfetch.max_streams              8
>>    vfs.zfs.zfetch.min_sec_reap             2
>>    vfs.zfs.zfetch.block_cap                256
>>    vfs.zfs.zfetch.array_rd_sz              1048576
>>    vfs.zfs.top_maxinflight                 32
>>    vfs.zfs.resilver_delay                  2
>>    vfs.zfs.scrub_delay                     4
>>    vfs.zfs.scan_idle                       50
>>    vfs.zfs.scan_min_time_ms                1000
>>    vfs.zfs.free_min_time_ms                1000
>>    vfs.zfs.resilver_min_time_ms            3000
>>    vfs.zfs.no_scrub_io                     0
>>    vfs.zfs.no_scrub_prefetch               0
>>    vfs.zfs.free_max_blocks                 -1
>>    vfs.zfs.metaslab.gang_bang              16777217
>>    vfs.zfs.metaslab.fragmentation_threshold70
>>    vfs.zfs.metaslab.debug_load             0
>>    vfs.zfs.metaslab.debug_unload           0
>>    vfs.zfs.metaslab.df_alloc_threshold     131072
>>    vfs.zfs.metaslab.df_free_pct            4
>>    vfs.zfs.metaslab.min_alloc_size         33554432
>>    vfs.zfs.metaslab.load_pct               50
>>    vfs.zfs.metaslab.unload_delay           8
>>    vfs.zfs.metaslab.preload_limit          3
>>    vfs.zfs.metaslab.preload_enabled        1
>>    vfs.zfs.metaslab.fragmentation_factor_enabled1
>>    vfs.zfs.metaslab.lba_weighting_enabled  1
>>    vfs.zfs.metaslab.bias_enabled           1
>>    vfs.zfs.condense_pct                    200
>>    vfs.zfs.mg_noalloc_threshold            0
>>    vfs.zfs.mg_fragmentation_threshold      85
>>    vfs.zfs.check_hostid                    1
>>    vfs.zfs.spa_load_verify_maxinflight     10000
>>    vfs.zfs.spa_load_verify_metadata        1
>>    vfs.zfs.spa_load_verify_data            1
>>    vfs.zfs.recover                         0
>>    vfs.zfs.deadman_synctime_ms             1000000
>>    vfs.zfs.deadman_checktime_ms            5000
>>    vfs.zfs.deadman_enabled                 1
>>    vfs.zfs.spa_asize_inflation             24
>>    vfs.zfs.spa_slop_shift                  5
>>    vfs.zfs.space_map_blksz                 4096
>>    vfs.zfs.txg.timeout                     5
>>    vfs.zfs.vdev.metaslabs_per_vdev         200
>>    vfs.zfs.vdev.cache.max                  16384
>>    vfs.zfs.vdev.cache.size                 0
>>    vfs.zfs.vdev.cache.bshift               16
>>    vfs.zfs.vdev.trim_on_init               1
>>    vfs.zfs.vdev.mirror.rotating_inc        0
>>    vfs.zfs.vdev.mirror.rotating_seek_inc   5
>>    vfs.zfs.vdev.mirror.rotating_seek_offset1048576
>>    vfs.zfs.vdev.mirror.non_rotating_inc    0
>>    vfs.zfs.vdev.mirror.non_rotating_seek_inc1
>>    vfs.zfs.vdev.async_write_active_min_dirty_percent30
>>    vfs.zfs.vdev.async_write_active_max_dirty_percent60
>>    vfs.zfs.vdev.max_active                 1000
>>    vfs.zfs.vdev.sync_read_min_active       10
>>    vfs.zfs.vdev.sync_read_max_active       10
>>    vfs.zfs.vdev.sync_write_min_active      10
>>    vfs.zfs.vdev.sync_write_max_active      10
>>    vfs.zfs.vdev.async_read_min_active      1
>>    vfs.zfs.vdev.async_read_max_active      3
>>    vfs.zfs.vdev.async_write_min_active     1
>>    vfs.zfs.vdev.async_write_max_active     10
>>    vfs.zfs.vdev.scrub_min_active           1
>>    vfs.zfs.vdev.scrub_max_active           2
>>    vfs.zfs.vdev.trim_min_active            1
>>    vfs.zfs.vdev.trim_max_active            64
>>    vfs.zfs.vdev.aggregation_limit          131072
>>    vfs.zfs.vdev.read_gap_limit             32768
>>    vfs.zfs.vdev.write_gap_limit            4096
>>    vfs.zfs.vdev.bio_flush_disable          0
>>    vfs.zfs.vdev.bio_delete_disable         0
>>    vfs.zfs.vdev.trim_max_bytes             2147483648
>>    vfs.zfs.vdev.trim_max_pending           64
>>    vfs.zfs.max_auto_ashift                 13
>>    vfs.zfs.min_auto_ashift                 9
>>    vfs.zfs.zil_replay_disable              0
>>    vfs.zfs.cache_flush_disable             0
>>    vfs.zfs.zio.use_uma                     1
>>    vfs.zfs.zio.exclude_metadata            0
>>    vfs.zfs.sync_pass_deferred_free         2
>>    vfs.zfs.sync_pass_dont_compress         5
>>    vfs.zfs.sync_pass_rewrite               2
>>    vfs.zfs.snapshot_list_prefetch          0
>>    vfs.zfs.super_owner                     0
>>    vfs.zfs.debug                           0
>>    vfs.zfs.version.ioctl                   4
>>    vfs.zfs.version.acl                     1
>>    vfs.zfs.version.spa                     5000
>>    vfs.zfs.version.zpl                     5
>>    vfs.zfs.vol.mode                        1
>>    vfs.zfs.vol.unmap_enabled               1
>>    vfs.zfs.trim.enabled                    1
>>    vfs.zfs.trim.txg_delay                  32
>>    vfs.zfs.trim.timeout                    30
>>    vfs.zfs.trim.max_interval               1
>>=20
>> ------------------------------------------------------------------------
>>=20
>>=20
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FF6F6A0-A23F-4EDE-98F6-5B8E41EC34A1>