Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Dec 2023 06:58:31 +0000
From:      bugzilla-noreply@freebsd.org
To:        fs@FreeBSD.org
Subject:   [Bug 275594] High CPU usage by arc_prune; analysis and fix
Message-ID:  <bug-275594-3630-1PQNkikAXX@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-275594-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-275594-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275594

--- Comment #12 from Seigo Tanimura <seigo.tanimura@gmail.com> ---
(In reply to Seigo Tanimura from comment #10)

I have added the fix to enable the extra vnode recycling and tested with the
same setup.

Source on GitHub:
- Repo: https://github.com/altimeter-130ft/freebsd-freebsd-src
- Branches
  - Fix: release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-interval-=
fix
  - Counters atop Fix:
release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-interval-counters

Test setup:
The same as "Ongoing test" in bug #275594, comment #6.

- vfs.vnode.vnlru.max_free_per_call: 4000000 (=3D=3D
vfs.vnode.vnlru.max_free_per_call)
- vfs.zfs.arc.prune_interval: 1000 (my fix for arc_prune interval enabled)
- vfs.vnode.vnlru.extra_recycle: 1 (extra vnode recycle fix enabled)

Build time:
06:50:05 (312 pkgs / hr)

Counters after completing the build, with some remarks:

# The iteration attempts in vnlru_free_impl().
# This includes the retry from the head of vnode_list.
vfs.vnode.free.free_attempt: 33934506866

# The number of the vnodes recycled successfully, including vtryrecycle().
vfs.vnode.free.free_success: 42945537

# The number of the successful recycles in phase 2 upon the VREG (regular f=
ile)
vnodes.
# - cleanbuf_vmpage_only: the vnodes held by the clean bufs and resident VM
pages only.
# - cleanbuf_only: the vnodes held by the clean bufs only.
vfs.vnode.free.free_phase2_retry_reg_cleanbuf_vmpage_only: 845659
vfs.vnode.free.free_phase2_retry_reg_cleanbuf_only: 3

# The number of the iteration skips due to a held vnode. ("phase 2" hereaft=
er)
# NB the successful recycles in phase 2 are not included.
vfs.vnode.free.free_phase2_retry: 8923850577

# The number of the phase 2 skips upon the VREG vnodes.
vfs.vnode.free.free_phase2_retry_reg: 8085735334

# The number of the phase 2 skips upon the VREG vnodes in use.
# Almost all phase 2 skips upon VREG fell into this.
vfs.vnode.free.free_phase2_retry_reg_inuse: 8085733060

# The number of the successful recycles in phase 2 upon the VDIR (directory)
vnodes.
# - free_phase2_retry_dir_nc_src_only: the vnodes held by the namecache ent=
ries
only.
vfs.vnode.free.free_phase2_retry_dir_nc_src_only: 2234194

# The number of the phase 2 skips upon the VDIR vnodes.
vfs.vnode.free.free_phase2_retry_dir: 834902819

# The number of the phase 2 skips upon the VDIR vnodes in use.
# Almost all phase 2 skips upon VDIR fell into this.
vfs.vnode.free.free_phase2_retry_dir_inuse: 834902780

Other findings:

- The behaviour upon the arc_prune thread CPU usage was mostly the same.
  - The peak reduced just a few percents, not likely to be the essential fi=
x.

- The namecache hit ratio degraded about 10 - 20%.
  - Maybe the recycled vnodes are looked up again, especially the directori=
es.

-----

The issue still exists essentially with the extra vnode recycle.  Maybe the
root cause is in ZFS rather than the OS.

There are some suspicious findings on the in-memory dnode behaviour during =
the
tests so far:

- vfs.zfs.arc_max does not enforce the max size of
kstat.zfs.misc.arcstats.dnode_size.
  - vfs.zfs.arc_max: 4GB
  - vfs.zfs.arc.dnode_limit_percent: 10 (default)
  - sizeof(struct dnode_t): 808 bytes
    - Found by "vmstat -z | grep dnode_t".
  - kstat.zfs.misc.arcstats.arc_dnode_limit: 400MB (default,
vfs.zfs.arc.dnode_limit_percent percent of vfs.zfs.arc_max)
    - ~495K dnodes.
  - kstat.zfs.misc.arcstats.dnode_size, max: ~ 1.8GB
    - ~2.2M dnodes.
    - Almost equal to the max observed number of the vnodes.

- The dnode_t zone of uma(9) does not have the limit.

>From above, the number of the in-memory dnodes looks like the bottleneck.=20
Maybe the essential solution is to configure vfs.zfs.arc.dnode_limit explic=
itly
so that ZFS can hold all dnodes required by the application in the memory.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275594-3630-1PQNkikAXX>