Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Dec 2023 10:37:42 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 275594] High CPU usage by arc_prune; analysis and fix
Message-ID:  <bug-275594-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275594

            Bug ID: 275594
           Summary: High CPU usage by arc_prune; analysis and fix
           Product: Base System
           Version: 14.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: seigo.tanimura@gmail.com

Created attachment 246849
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D246849&action=
=3Dedit
The proposed fix and additional patch.

This is the followup to bug #275063 and bug #274698.

After applying the fix published in FreeBSD-EN-23:18.openzfs, I have again =
seen
the issue reproducing.

I have been tracking this issue since the release of 14.0-RELEASE, and now
ready to share the more promising fix.  Please review and make the fix plan.

* Test Environment: Hypervisor
- CPU: Intel Core i7-13700KF 3.4GHz (24 threads)
- RAM: 128 GB
- OS: Windows 10
- Storage: NVMe and SATA HDDs
- Hypervisor: VMWare Workstation 17.5

* Test Environment: VM & OS
- vCPUs: 16
- RAM: 16 GB
- Swap: 128 GB on NVMe
- OS: FreeBSD 14.0-RELEASE
- Storage & Filesystems: ZFS mainly
  - Main pool: 1.5G on SATA HDD
  - ZIL: 16 GB on NVMe
  - L2ARC: 64 GB on NVMe
- sysctl(3) tunings:
  - vfs.vnode.param.limit=3D4000000
  - vfs.vnode.vnlru.max_free_per_call=3D100000
  - vfs.zfs.arc_max=3D4294967296

* Application
- poudriere
  - Number of ports to build: 2128 (including dependencies)
  - Major configurations for port building
    - poudriere.conf
      - #NO_ZFS=3Dyes (ZFS enabled)
      - USE_PORTLINT=3Dno
      - USE_TMPFS=3D"wrkdir data localbase"
      - TMPFS_LIMIT=3D32
      - DISTFILES_CACHE=3D(configured in ZFS)
      - CCACHE_DIR=3D(configured in ZFS)
        - The cache is filled in advance.
      - CCACHE_STATIC_PREFIX=3D/usr/local
      - PARALLEL_JOBS=3D8 (actually givin via "poudriere bulk -J")
    - make.conf
      - MAKE_JOBS_NUMBER=3D2

* Steps
1. Remove the package output directory, so that all packages are built.
2. Run 'poudriere bulk' to start the parallel build.
3. Observe the system and build progress by top(1), poudriere web UI,
cmdwatch(1) + sysctl(8), etc.

* Observed behaviors during building
- In 10 - 15 minutes, the ARC pruning started.
  - No affects on the performance.
- In about 30 minutes, the ARC pruning started to miss the pruning target.
  - The 100% CPU usage by arc_prune observed for a few seconds occasionally.
- In about 2 hours, the large ports (lang/rust, lang/gcc12) started to buil=
d.
  - The 100% CPU usage by arc_prune observer for 5 - 10 seconds often.
  - Several other threads also exhibit the 100% CPU usage.
- Build time: 06:53:33 (309 pkgs / hr)

* Analysis
The true root cause is the consecutive execution of ARC pruning.  When there
are no vnodes ready to reclaim, the ARC pruning walks through all vnodes wi=
th
vnode_list_lock held.

The detail is described in:
https://github.com/altimeter-130ft/freebsd-freebsd-src/commit/f1fa73f4d5943=
efa874fa3ede49dd73bb8ef4bb4

* Proposed fix
- Enforce the interval between the ARC pruning execution.
  - Patch (in the attached archive): openzfs-arc_prune-interval-fix.diff
  - GitHub:
https://github.com/altimeter-130ft/freebsd-freebsd-src/tree/release/14.0.0/=
release-14_0_0-p2-topic-openzfs-arc_prune-interval-fix
    - Branch base:
https://github.com/altimeter-130ft/freebsd-freebsd-src/commit/06497fbd52e2f=
138b7d590c8499d9cebad182850
      - releng/14.0 down to FreeBSD-SA-23:17.pf and version bumping.
- NB this fix is meant for FreeBSD only.  Please refer to the open issues as
well.

* Additional patch
- The sysctl(3) counters to observe the vnode recycling behavior.
  - Patch (in the attached archive): openzfs-arc_prune-interval-counters.di=
ff
  - GitHub:
https://github.com/altimeter-130ft/freebsd-freebsd-src/tree/release/14.0.0/=
release-14_0_0-p2-topic-openzfs-arc_prune-interval-counters
    - Branch base: the branch for the proposed fix.
  - The following counters may be committed for the debugging and tuning ai=
d:
    - vfs.vnode.free.free_call
      - The calls to vnlru_free_impl().
    - vfs.vnode.free.free_retry
      - The retries from the vnode list head in vnlru_free_impl().
    - vfs.vnode.free.free_giveup
      - The giveups in vnlru_free_impl().
    - Under the heavy ZFS access, free_retry and free_giveup increase along
with free_call, indicating the misses on the vnode reclaim target.

* Observed behaviors with the proposed fix during building
- The arc_prune kernel thread did not exhibit the 100% CPU usage.
  - Max: 30 - 35%.
  - The continuous CPU usage disappeared mostly.
  - The vnlru kernel thread ran in parallel with arc_prune.
- Build time: 06:37:03 (322 pkgs / hr)
  - Improved for ~8.5%.

* Open issues
- Please also refer to the fix commit log.
- Who should implement the fix?
  - OpenZFS taskq should be fixed if the issue is seen and resolvable on Li=
nux
as well.
- Is the proposed design contract upon the ARC pruning reasonable?

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275594-227>