From nobody Fri Oct 13 23:43:12 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4S6jk870Xpz4wxpH; Fri, 13 Oct 2023 23:43:12 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4S6jk85BMKz3Ff7; Fri, 13 Oct 2023 23:43:12 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1697240592; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Z89uwQsX8clhsM53GPmCDbJSvjNCB1WFCb9eXkd7Iqk=; b=O19qaWoMiYcmkcwE9hMy6rJVYV2gRagM7Yc101frHbLjXrI9j2qgyfxsvyLEaofeneWCt4 YnyGtEQcRFEKREhwviqo0WtJDUIcIRRd21drH/eX1593VW02k/Kz3eRJMzJSCyg6YnwLXn pRxJjOqHg6dEBfcM4mlyGJnRsv2/Wd4Y4bfZv1AHuy67DzLvadBWLVP51DMkXrhj3ujMAv xsZSIEdVtKkbNsoFoZ4C5qPS9oVoLeKGMleu/QiLg7mZ8D+9ZNsu27WmAAysUz4bHoyql4 i9+fMnxhYsfSDjZ33fFKBu5CjMCPtc+U+uNPoDbvSdduwULlz0ZXrifJ8tClQQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1697240592; a=rsa-sha256; cv=none; b=OYOQdE7s/87DkSvwoisJtcF4QjDZpowQlIwhQFlJGY6JwuqO1rRo96wZlvVLFJt8v9B4XY oKkJ5NsAqonBTtAa3lNthz86tor+YroXZaBFEJYZtpCsmUkr5niRLaNKuKyfY+Z8bJ469A Ghi1K9BILl8+NVKFQkMpTMc0fk/U/JQJYDmWxCzTfc9qSOWbwUkHkrnDcDrJo7V4D2cfyQ T9x/74QZysmrxSp4vGhkHfYMh3hFqaoFzPiollzIL4vuxPi6ORng1kx+YIICzqkFgNK8Yv SZO83gD/l6uhqfXndkWt7BZk8mtMSNkdeJT4eQVq4sT/ul/TVAA8jGkIoXoEFA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1697240592; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Z89uwQsX8clhsM53GPmCDbJSvjNCB1WFCb9eXkd7Iqk=; b=ClaZnwQYA19frqQj4ud4pB3ou8Rht2AIScDpds+y9EvwPfyDr6z4Wfk9jsABW6fld5vIkf +EBdrQSdeFL5Z5loyZF2fJNG/Z/uZIYKpbH7cJgLqymxBvrEijFutMJ7AxYBQil3qcMMGd h/oEdk68kuYjjOw1X+wg0nM93o0ujK71UM+Z4F2a6IZ4ED9xx4W2zr1g1CNSzh10kl7no2 GC/CbZFg5mzT7g4wrLVwBW8feOvxhTSx2a58NMcwGCIKZ6ZiFbNYzxoZdo0pYkgIgrP4qV c3uNPyRRWU6MmGVEvWgxXrepnd8Bl8+VMYPaYkrgYTtZyNTM5FfiPd2LXHW25w== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4S6jk84FTpz17Xk; Fri, 13 Oct 2023 23:43:12 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 39DNhCTZ014833; Fri, 13 Oct 2023 23:43:12 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 39DNhCZ4014830; Fri, 13 Oct 2023 23:43:12 GMT (envelope-from git) Date: Fri, 13 Oct 2023 23:43:12 GMT Message-Id: <202310132343.39DNhCZ4014830@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Mateusz Guzik Subject: git: cfbc3927613a - stable/14 - vfs: prefix regular vnlru with a special case for free vnodes List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: mjg X-Git-Repository: src X-Git-Refname: refs/heads/stable/14 X-Git-Reftype: branch X-Git-Commit: cfbc3927613a8455498db5ff3e67046a4dab15a1 Auto-Submitted: auto-generated The branch stable/14 has been updated by mjg: URL: https://cgit.FreeBSD.org/src/commit/?id=cfbc3927613a8455498db5ff3e67046a4dab15a1 commit cfbc3927613a8455498db5ff3e67046a4dab15a1 Author: Mateusz Guzik AuthorDate: 2023-09-14 19:08:40 +0000 Commit: Mateusz Guzik CommitDate: 2023-10-13 23:41:47 +0000 vfs: prefix regular vnlru with a special case for free vnodes Works around severe performance problems in certain corner cases, see the commentary added. Modifying vnlru logic has proven rather error prone in the past and a release is near, thus take the easy way out and fix it without having to dig into the current machinery. (cherry picked from commit 90a008e94bb205e5b8f3c41d57e155b59a6be95d) --- sys/kern/vfs_subr.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 113 insertions(+), 4 deletions(-) diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 5e39a149ef36..5834feff080c 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -1427,6 +1427,14 @@ vnlru_free_locked(int count) return (ret); } +static int +vnlru_free(int count) +{ + + mtx_lock(&vnode_list_mtx); + return (vnlru_free_locked(count)); +} + void vnlru_free_vfsops(int count, struct vfsops *mnt_op, struct vnode *mvp) { @@ -1594,6 +1602,106 @@ vnlru_kick_cond(void) mtx_unlock(&vnode_list_mtx); } +static void +vnlru_proc_sleep(void) +{ + + if (vnlruproc_sig) { + vnlruproc_sig = 0; + wakeup(&vnlruproc_sig); + } + msleep(vnlruproc, &vnode_list_mtx, PVFS|PDROP, "vlruwt", hz); +} + +/* + * A lighter version of the machinery below. + * + * Tries to reach goals only by recycling free vnodes and does not invoke + * uma_reclaim(UMA_RECLAIM_DRAIN). + * + * This works around pathological behavior in vnlru in presence of tons of free + * vnodes, but without having to rewrite the machinery at this time. Said + * behavior boils down to continuously trying to reclaim all kinds of vnodes + * (cycling through all levels of "force") when the count is transiently above + * limit. This happens a lot when all vnodes are used up and vn_alloc + * speculatively increments the counter. + * + * Sample testcase: vnode limit 8388608, 20 separate directory trees each with + * 1 million files in total and 20 find(1) processes stating them in parallel + * (one per each tree). + * + * On a kernel with only stock machinery this needs anywhere between 60 and 120 + * seconds to execute (time varies *wildly* between runs). With the workaround + * it consistently stays around 20 seconds. + * + * That is to say the entire thing needs a fundamental redesign (most notably + * to accommodate faster recycling), the above only tries to get it ouf the way. + * + * Return values are: + * -1 -- fallback to regular vnlru loop + * 0 -- do nothing, go to sleep + * >0 -- recycle this many vnodes + */ +static long +vnlru_proc_light_pick(void) +{ + u_long rnumvnodes, rfreevnodes; + + if (vstir || vnlruproc_sig == 1) + return (-1); + + rnumvnodes = atomic_load_long(&numvnodes); + rfreevnodes = vnlru_read_freevnodes(); + + /* + * vnode limit might have changed and now we may be at a significant + * excess. Bail if we can't sort it out with free vnodes. + */ + if (rnumvnodes > desiredvnodes) { + if (rnumvnodes - rfreevnodes >= desiredvnodes || + rfreevnodes <= wantfreevnodes) { + return (-1); + } + + return (rnumvnodes - desiredvnodes); + } + + /* + * Don't try to reach wantfreevnodes target if there are too few vnodes + * to begin with. + */ + if (rnumvnodes < wantfreevnodes) { + return (0); + } + + if (rfreevnodes < wantfreevnodes) { + return (-1); + } + + return (0); +} + +static bool +vnlru_proc_light(void) +{ + long freecount; + + mtx_assert(&vnode_list_mtx, MA_NOTOWNED); + + freecount = vnlru_proc_light_pick(); + if (freecount == -1) + return (false); + + if (freecount != 0) { + vnlru_free(freecount); + } + + mtx_lock(&vnode_list_mtx); + vnlru_proc_sleep(); + mtx_assert(&vnode_list_mtx, MA_NOTOWNED); + return (true); +} + static void vnlru_proc(void) { @@ -1609,6 +1717,10 @@ vnlru_proc(void) want_reread = false; for (;;) { kproc_suspend_check(vnlruproc); + + if (force == 0 && vnlru_proc_light()) + continue; + mtx_lock(&vnode_list_mtx); rnumvnodes = atomic_load_long(&numvnodes); @@ -1639,10 +1751,7 @@ vnlru_proc(void) vstir = false; } if (force == 0 && !vnlru_under(rnumvnodes, vlowat)) { - vnlruproc_sig = 0; - wakeup(&vnlruproc_sig); - msleep(vnlruproc, &vnode_list_mtx, - PVFS|PDROP, "vlruwt", hz); + vnlru_proc_sleep(); continue; } rfreevnodes = vnlru_read_freevnodes();