Date: Thu, 07 Dec 2023 10:37:42 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 275594] High CPU usage by arc_prune; analysis and fix Message-ID: <bug-275594-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275594 Bug ID: 275594 Summary: High CPU usage by arc_prune; analysis and fix Product: Base System Version: 14.0-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: seigo.tanimura@gmail.com Created attachment 246849 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D246849&action= =3Dedit The proposed fix and additional patch. This is the followup to bug #275063 and bug #274698. After applying the fix published in FreeBSD-EN-23:18.openzfs, I have again = seen the issue reproducing. I have been tracking this issue since the release of 14.0-RELEASE, and now ready to share the more promising fix. Please review and make the fix plan. * Test Environment: Hypervisor - CPU: Intel Core i7-13700KF 3.4GHz (24 threads) - RAM: 128 GB - OS: Windows 10 - Storage: NVMe and SATA HDDs - Hypervisor: VMWare Workstation 17.5 * Test Environment: VM & OS - vCPUs: 16 - RAM: 16 GB - Swap: 128 GB on NVMe - OS: FreeBSD 14.0-RELEASE - Storage & Filesystems: ZFS mainly - Main pool: 1.5G on SATA HDD - ZIL: 16 GB on NVMe - L2ARC: 64 GB on NVMe - sysctl(3) tunings: - vfs.vnode.param.limit=3D4000000 - vfs.vnode.vnlru.max_free_per_call=3D100000 - vfs.zfs.arc_max=3D4294967296 * Application - poudriere - Number of ports to build: 2128 (including dependencies) - Major configurations for port building - poudriere.conf - #NO_ZFS=3Dyes (ZFS enabled) - USE_PORTLINT=3Dno - USE_TMPFS=3D"wrkdir data localbase" - TMPFS_LIMIT=3D32 - DISTFILES_CACHE=3D(configured in ZFS) - CCACHE_DIR=3D(configured in ZFS) - The cache is filled in advance. - CCACHE_STATIC_PREFIX=3D/usr/local - PARALLEL_JOBS=3D8 (actually givin via "poudriere bulk -J") - make.conf - MAKE_JOBS_NUMBER=3D2 * Steps 1. Remove the package output directory, so that all packages are built. 2. Run 'poudriere bulk' to start the parallel build. 3. Observe the system and build progress by top(1), poudriere web UI, cmdwatch(1) + sysctl(8), etc. * Observed behaviors during building - In 10 - 15 minutes, the ARC pruning started. - No affects on the performance. - In about 30 minutes, the ARC pruning started to miss the pruning target. - The 100% CPU usage by arc_prune observed for a few seconds occasionally. - In about 2 hours, the large ports (lang/rust, lang/gcc12) started to buil= d. - The 100% CPU usage by arc_prune observer for 5 - 10 seconds often. - Several other threads also exhibit the 100% CPU usage. - Build time: 06:53:33 (309 pkgs / hr) * Analysis The true root cause is the consecutive execution of ARC pruning. When there are no vnodes ready to reclaim, the ARC pruning walks through all vnodes wi= th vnode_list_lock held. The detail is described in: https://github.com/altimeter-130ft/freebsd-freebsd-src/commit/f1fa73f4d5943= efa874fa3ede49dd73bb8ef4bb4 * Proposed fix - Enforce the interval between the ARC pruning execution. - Patch (in the attached archive): openzfs-arc_prune-interval-fix.diff - GitHub: https://github.com/altimeter-130ft/freebsd-freebsd-src/tree/release/14.0.0/= release-14_0_0-p2-topic-openzfs-arc_prune-interval-fix - Branch base: https://github.com/altimeter-130ft/freebsd-freebsd-src/commit/06497fbd52e2f= 138b7d590c8499d9cebad182850 - releng/14.0 down to FreeBSD-SA-23:17.pf and version bumping. - NB this fix is meant for FreeBSD only. Please refer to the open issues as well. * Additional patch - The sysctl(3) counters to observe the vnode recycling behavior. - Patch (in the attached archive): openzfs-arc_prune-interval-counters.di= ff - GitHub: https://github.com/altimeter-130ft/freebsd-freebsd-src/tree/release/14.0.0/= release-14_0_0-p2-topic-openzfs-arc_prune-interval-counters - Branch base: the branch for the proposed fix. - The following counters may be committed for the debugging and tuning ai= d: - vfs.vnode.free.free_call - The calls to vnlru_free_impl(). - vfs.vnode.free.free_retry - The retries from the vnode list head in vnlru_free_impl(). - vfs.vnode.free.free_giveup - The giveups in vnlru_free_impl(). - Under the heavy ZFS access, free_retry and free_giveup increase along with free_call, indicating the misses on the vnode reclaim target. * Observed behaviors with the proposed fix during building - The arc_prune kernel thread did not exhibit the 100% CPU usage. - Max: 30 - 35%. - The continuous CPU usage disappeared mostly. - The vnlru kernel thread ran in parallel with arc_prune. - Build time: 06:37:03 (322 pkgs / hr) - Improved for ~8.5%. * Open issues - Please also refer to the fix commit log. - Who should implement the fix? - OpenZFS taskq should be fixed if the issue is seen and resolvable on Li= nux as well. - Is the proposed design contract upon the ARC pruning reasonable? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275594-227>