From nobody Thu Mar 28 04:22:30 2024 X-Original-To: fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4V4r3s6S66z5G8YH for ; Thu, 28 Mar 2024 04:22:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4V4r3s5PQZz4RC9 for ; Thu, 28 Mar 2024 04:22:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1711599753; a=rsa-sha256; cv=none; b=VDpZ/3fDKmtj5202XQ67v9RAawYBFzwyJSKR1d+cZ3Qdnvjps9Bt1P66PnEvDxsXLbpGVy oMgGFd2P2T35SoRIBam+ICb6pNHALF2/+SaTN3ixHYplmenFYMfFrsinDKJ1As5s6Nupo1 7oo/pAbjFqQn7b9VsEh5OjY1cT0uMrJ7ORtpa9Zgu79PxIZ9mDp3HQ07KLYxUyip9ok4jc jLZp3gzovRR78WQzdu1L3GE1HlHsB4GtQUUTpVnkACSSaCvbpkiku7DpPu2u1ye73SZQ7E V+pP0AB3U7JiNBCslNCYyX4rw49CtaTPAE5s3ONi998THWE9gdTVSmncXAIXRg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1711599753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z6j0ahEIHrpwxln3v1VfjtO8vrA0g+Gg/WF0ccJMcDI=; b=QVjqUJMoudJ+ul/A51P5EQwGa3sagEmx/p8/6c7xQFR7ojYy+dBpQSC2rWUBleQ6RmdGFP l2rNJ8AK9R3oI7ZmvyYhcGERux4MGQg/IaqVKrUxAcHRpVNUdUqaOH5eDBlZEZs/2mhoDc 5XNyZ1DoQ94KNEF3xIu0NKuusuLGBwIn3u3YCJs+Eguc/JPBB0+FhsYVgoS1sPfwyieyOj rI2/MyWqz4j7d4GnLpAMDggyhlXdvUxONPzqMM8irvyajdLOFkCRZegsRyGCYo0qNxQBna 13omtLpm6ksxOVK6vkHegiMK6gD1rRJzb9Zr5XcViGUfLY+a+IfRO6H2t6GcTg== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4V4r3s51yPz1CDj for ; Thu, 28 Mar 2024 04:22:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 42S4MXhK000405 for ; Thu, 28 Mar 2024 04:22:33 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 42S4MXTt000404 for fs@FreeBSD.org; Thu, 28 Mar 2024 04:22:33 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 275594] High CPU usage by arc_prune; analysis and fix Date: Thu, 28 Mar 2024 04:22:30 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 14.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: seigo.tanimura@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275594 --- Comment #97 from Seigo Tanimura --- Again my apology for the delayed comment. Now that the nullfs fix (https://reviews.freebsd.org/D44217) has been merged into stable/13 and stable/14, the next diff is https://reviews.freebsd.org/D44170, the backport of the FreeBSD-EN-23:18.openzfs fix to stable/13. This is essentially the functio= nal reimplementation of 799e09f75a on main (https://github.com/openzfs/zfs/commit/799e09f75a31e80a1702a850838c79879af8= b917) and 3ec4ea68d4 on zfs-2.2-release (https://github.com/openzfs/zfs/commit/3ec4ea68d491a82c8de3360d50032bdecd53= 608f) of OpenZFS, focusing at avoiding the pileup on the arc_prune kernel thread. Among the FreeBSD committers, I have found the names of mav and markj in the logs of the commits above. I believe they are suitable for the diff review. The rest of the diffs: - https://reviews.freebsd.org/D44171 (kern/vfs: Add the per-filesystem vnode counter to struct mount.) - https://reviews.freebsd.org/D44173 (kern/openzfs: Regulate the ZFS ARC pruning process precisely.) are challenging because they address the interaction problem between OpenZFS and the OS (FreeBSD) kernel. To my belief, the reviewers with the insights= on both OpenZFS and FreeBSD are desired. If the review on these diffs are too difficult, an alternative is to add a sysctl(3) toggle that controls the fix feature on D44173 so that the fix ca= n be merged without enabling it by default. Thanks to the many testers on this issue, I now believe the fix is ready for the more extensive public test. ----- Besides the review, there are quite a few findings regarding the healthy operation of the OpenZFS ARC and its pruning and eviction, spotted out of my analysis. It would be great to document them somehow. Also, they should be minded upon reviewing D44171 and D44173. * OpenZFS ARC buffers and their evictability - An ARC buffer is separated for reading and writing. - A read ARC buffer must be copied into a write ARC buffer in order to "update" it in the copy-on-write manner. - A read ARC buffer is not evictable until its content is read from the poo= l. - A write ARC buffer is not evictable until its content is written into the pool. - A write ARC buffer depending on the write of another write ARC buffer m= ay remain unevictable for a long time. - Under a healthy operation, almost all ARC read and write buffers for data= are evictable. - Some part of the ARC read and write buffers for metadata are not evicta= ble because of their internal dependencies required by the OpenZFS design. - The write ARC buffers of the vnodes in use (v_usecount > 0) have been fou= nd to remain unevictable until they get no longer in use. - This is the direct cause of the excess ARC pruning during poudriere-bulk(8); the nullfs filesystems cached the OpenZFS vnodes by addi= ng v_usecount. - The similar issue may occur out of a difference cause, eg. too many ope= ned OpenZFS files. * Limitations of OpenZFS ARC pruning and eviction on FreeBSD - The ARC pruning cannot count the OpenZFS znodes (ie FreeBSD vnodes) unprunable because of the requirements on the OS side. - The vnodes with the non-zero v_usecount or v_holdcnt (or both) fall into such the case. - The attempts to recycle such the vnodes causes the long lock upon the global vnode list. - The pagedaemon kernel threads may excessively block for the ARC eviction progress. - OpenZFS supports the kernel threads to wait for a desired size of the A= RC eviction progress. - The waiting kernel threads are resumed when either the desired ARC eviction progresses happen or there are no evictable ARC buffers at all. - Under a heavy load upon OpenZFS, it often manages to evict the ARC buff= ers much smaller, but non-zero, than the desired sizes. - The waiting kernel threads can neither meet the desired ARC evicition progress nor give up quickly. --=20 You are receiving this mail because: You are the assignee for the bug.=