Date: Thu, 28 Mar 2024 04:22:30 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 275594] High CPU usage by arc_prune; analysis and fix Message-ID: <bug-275594-3630-OUhPBjCHHa@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-275594-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-275594-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275594 --- Comment #97 from Seigo Tanimura <seigo.tanimura@gmail.com> --- Again my apology for the delayed comment. Now that the nullfs fix (https://reviews.freebsd.org/D44217) has been merged into stable/13 and stable/14, the next diff is https://reviews.freebsd.org/D44170, the backport of the FreeBSD-EN-23:18.openzfs fix to stable/13. This is essentially the functio= nal reimplementation of 799e09f75a on main (https://github.com/openzfs/zfs/commit/799e09f75a31e80a1702a850838c79879af8= b917) and 3ec4ea68d4 on zfs-2.2-release (https://github.com/openzfs/zfs/commit/3ec4ea68d491a82c8de3360d50032bdecd53= 608f) of OpenZFS, focusing at avoiding the pileup on the arc_prune kernel thread. Among the FreeBSD committers, I have found the names of mav and markj in the logs of the commits above. I believe they are suitable for the diff review. The rest of the diffs: - https://reviews.freebsd.org/D44171 (kern/vfs: Add the per-filesystem vnode counter to struct mount.) - https://reviews.freebsd.org/D44173 (kern/openzfs: Regulate the ZFS ARC pruning process precisely.) are challenging because they address the interaction problem between OpenZFS and the OS (FreeBSD) kernel. To my belief, the reviewers with the insights= on both OpenZFS and FreeBSD are desired. If the review on these diffs are too difficult, an alternative is to add a sysctl(3) toggle that controls the fix feature on D44173 so that the fix ca= n be merged without enabling it by default. Thanks to the many testers on this issue, I now believe the fix is ready for the more extensive public test. ----- Besides the review, there are quite a few findings regarding the healthy operation of the OpenZFS ARC and its pruning and eviction, spotted out of my analysis. It would be great to document them somehow. Also, they should be minded upon reviewing D44171 and D44173. * OpenZFS ARC buffers and their evictability - An ARC buffer is separated for reading and writing. - A read ARC buffer must be copied into a write ARC buffer in order to "update" it in the copy-on-write manner. - A read ARC buffer is not evictable until its content is read from the poo= l. - A write ARC buffer is not evictable until its content is written into the pool. - A write ARC buffer depending on the write of another write ARC buffer m= ay remain unevictable for a long time. - Under a healthy operation, almost all ARC read and write buffers for data= are evictable. - Some part of the ARC read and write buffers for metadata are not evicta= ble because of their internal dependencies required by the OpenZFS design. - The write ARC buffers of the vnodes in use (v_usecount > 0) have been fou= nd to remain unevictable until they get no longer in use. - This is the direct cause of the excess ARC pruning during poudriere-bulk(8); the nullfs filesystems cached the OpenZFS vnodes by addi= ng v_usecount. - The similar issue may occur out of a difference cause, eg. too many ope= ned OpenZFS files. * Limitations of OpenZFS ARC pruning and eviction on FreeBSD - The ARC pruning cannot count the OpenZFS znodes (ie FreeBSD vnodes) unprunable because of the requirements on the OS side. - The vnodes with the non-zero v_usecount or v_holdcnt (or both) fall into such the case. - The attempts to recycle such the vnodes causes the long lock upon the global vnode list. - The pagedaemon kernel threads may excessively block for the ARC eviction progress. - OpenZFS supports the kernel threads to wait for a desired size of the A= RC eviction progress. - The waiting kernel threads are resumed when either the desired ARC eviction progresses happen or there are no evictable ARC buffers at all. - Under a heavy load upon OpenZFS, it often manages to evict the ARC buff= ers much smaller, but non-zero, than the desired sizes. - The waiting kernel threads can neither meet the desired ARC evicition progress nor give up quickly. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275594-3630-OUhPBjCHHa>