From nobody Fri Sep 19 11:42:37 2025 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4cSrGQ0BQPz67tCb; Fri, 19 Sep 2025 11:42:38 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R12" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4cSrGP6WTQz3WDt; Fri, 19 Sep 2025 11:42:37 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1758282157; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ZVy1Ngu3/rUAmU2tIYvquFYTWduAkoLuRBp9OxI0LS8=; b=c1sVXCY9G6keflXcnHfLmbgP82UOzAB73LrT/uDY4jWRK8FtmoFGmsehmY6tTnUAYNAagY QYyPBCZI5b/Fb1vvqJENwiMFLMM/JfBhO7GAH2smGIfQteRnq7kI2EueyTie0/jDA3tt7x i5RzNxEkvc9/LxQfU9QgZ0L+RaaZRhVodTLqhieVDw30IGt6nJZDdizc0iyKsuncHSgxMd MouoVZ2KfIjE+hF5Qpmh36WBeNe/YQwdUWfgLzqdxCKbLbIFo6mSHWhqJgAkTaTxJ4IkBt reMGWX7OpCUSFlPO30Pq1hkAoWBvDw/GBDGMvzfVLw0r47VqOnnrkgIJV8jSXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1758282157; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ZVy1Ngu3/rUAmU2tIYvquFYTWduAkoLuRBp9OxI0LS8=; b=wVe0N5kE4lAuKXsSdxOiwodby9tTer6FF4jSvm1VCXuXyiIStNnSO4oULvD6LU0JWZdKHX hQTcbBN0ErbN+bgbtUWCtM7dxW6npVG4P0XngiYcYhzR/X9ltrmywTxKo5ojWCnOUzMu8D fgAe8/fcVG5LY9tbx1cnOgXVx7iXX115lhgLS12FGnT709DV7ZfT+KgkS9CdrGmld5FC+L J9w9OxoD1NKTrM4idEDXd+bEVxW+egCeyKKnnmDBcy9fGwT4z7hDYZJN/rF/EyWpOcBMnD pG/qzdiHpYeA6wmL79QYFTsrug8k0jFFL5+tXuPFfRpByj30zM2jxZixVnpiFQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1758282157; a=rsa-sha256; cv=none; b=YD57cZY58ULX8x8FJiV+4rjfKx6LRfUFNhQCEk1G5E8ktrCGbyP2sOK6aklQhLdiK/GQaA /euhQtF97FdeA/+XlIjJk4SNBQgNtg987MMCx2jwjGQHXqk+VQvBXVdrp12G6Pcj/mzdkx pn6FuWmSoQdkQIT6NOkRu+ixtjq1QM+LSMgFfAXXl+x4mUSa71T4cau/ncsL/XiVY7Cr70 zxDDNOM6pF/ddtHw0Dpzl41RUAisulQjB7h6VLzNjDwOxViObIxEjf3RjSo914FeBuay4v Bp/M78M5BFdX/SaDAaT14MGaFKc5EPxvePGrIKQk2T90ngFXYt8zu8Wy210Uzg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4cSrGP5sR3zwrm; Fri, 19 Sep 2025 11:42:37 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 58JBgbqx047360; Fri, 19 Sep 2025 11:42:37 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 58JBgbIb047357; Fri, 19 Sep 2025 11:42:37 GMT (envelope-from git) Date: Fri, 19 Sep 2025 11:42:37 GMT Message-Id: <202509191142.58JBgbIb047357@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Olivier Certner Subject: git: b5834753d330 - stable/14 - vm_domainset: Only probe domains once when iterating, instead of up to 4 times List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-all@freebsd.org Sender: owner-dev-commits-src-all@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: olce X-Git-Repository: src X-Git-Refname: refs/heads/stable/14 X-Git-Reftype: branch X-Git-Commit: b5834753d330f201195602d039633be8d811e204 Auto-Submitted: auto-generated The branch stable/14 has been updated by olce: URL: https://cgit.FreeBSD.org/src/commit/?id=b5834753d330f201195602d039633be8d811e204 commit b5834753d330f201195602d039633be8d811e204 Author: Olivier Certner AuthorDate: 2025-07-07 20:29:12 +0000 Commit: Olivier Certner CommitDate: 2025-09-19 11:41:41 +0000 vm_domainset: Only probe domains once when iterating, instead of up to 4 times Because of the 'di_minskip' logic, which resets the initial domain, an iterator starts by considering only domains that have more than 'free_min' pages in a first phase, and then all domains in a second one. Non-"underpaged" domains are thus examined twice, even if the allocation can't succeed. Re-scanning the same domains twice just wastes time, as allocation attempts that must not wait may rely on failing sooner and those that must will loop anyway (a domain previously scanned twice has more pages than 'free_min' and consequently vm_wait_doms() will just return immediately). Additionally, the DOMAINSET_POLICY_FIRSTTOUCH policy would aggravate this situation by reexamining the current domain again at the end of each phase. In the case of a single domain, this means doubling again the number of times domain 0 is probed. Implementation consists in adding two 'domainset_t' to 'struct vm_domainset_iter' (and removing the 'di_n' counter). The first, 'di_remain_mask', contains domains still to be explored in the current phase, the first phase concerning only domains with more pages than 'free_min' ('di_minskip' true) and the second one concerning only domains previously under 'free_min' ('di_minskip' false). The second, 'di_min_mask', holds the domains with less pages than 'free_min' encountered during the first phase, and serves as the reset value for 'di_remain_mask' when transitioning to the second phase. PR: 277476 Fixes: e5818a53dbd2 ("Implement several enhancements to NUMA policies.") Fixes: 23984ce5cd24 ("Avoid resource deadlocks when one domain has exhausted its memory."...) MFC after: 10 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D51249 (cherry picked from commit d440953942372ca275d0743a6e220631bde440ee) --- sys/vm/vm_domainset.c | 53 ++++++++++++++++++++++++++++++--------------------- sys/vm/vm_domainset.h | 6 +++++- 2 files changed, 36 insertions(+), 23 deletions(-) diff --git a/sys/vm/vm_domainset.c b/sys/vm/vm_domainset.c index ff0e476a68b9..f95f26b9d476 100644 --- a/sys/vm/vm_domainset.c +++ b/sys/vm/vm_domainset.c @@ -127,7 +127,8 @@ static void vm_domainset_iter_next(struct vm_domainset_iter *di, int *domain) { - KASSERT(di->di_n > 0, ("%s: Invalid n %d", __func__, di->di_n)); + KASSERT(!DOMAINSET_EMPTY(&di->di_remain_mask), + ("%s: Already iterated on all domains", __func__)); switch (di->di_policy) { case DOMAINSET_POLICY_FIRSTTOUCH: /* @@ -157,37 +158,39 @@ vm_domainset_iter_first(struct vm_domainset_iter *di, int *domain) switch (di->di_policy) { case DOMAINSET_POLICY_FIRSTTOUCH: *domain = PCPU_GET(domain); - if (DOMAINSET_ISSET(*domain, &di->di_valid_mask)) { - /* - * Add an extra iteration because we will visit the - * current domain a second time in the rr iterator. - */ - di->di_n = di->di_domain->ds_cnt + 1; + if (DOMAINSET_ISSET(*domain, &di->di_valid_mask)) break; - } /* * To prevent impossible allocations we convert an invalid * first-touch to round-robin. */ /* FALLTHROUGH */ case DOMAINSET_POLICY_ROUNDROBIN: - di->di_n = di->di_domain->ds_cnt; vm_domainset_iter_rr(di, domain); break; case DOMAINSET_POLICY_PREFER: *domain = di->di_domain->ds_prefer; - di->di_n = di->di_domain->ds_cnt; break; case DOMAINSET_POLICY_INTERLEAVE: vm_domainset_iter_interleave(di, domain); - di->di_n = di->di_domain->ds_cnt; break; default: panic("%s: Unknown policy %d", __func__, di->di_policy); } - KASSERT(di->di_n > 0, ("%s: Invalid n %d", __func__, di->di_n)); KASSERT(*domain < vm_ndomains, ("%s: Invalid domain %d", __func__, *domain)); + + /* Initialize the mask of domains to visit. */ + if (di->di_minskip) { + /* Phase 1: Skip domains under 'v_free_min'. */ + DOMAINSET_COPY(&di->di_valid_mask, &di->di_remain_mask); + DOMAINSET_ZERO(&di->di_min_mask); + } else + /* Phase 2: Browse domains that were under 'v_free_min'. */ + DOMAINSET_COPY(&di->di_min_mask, &di->di_remain_mask); + + /* Mark first domain as seen. */ + DOMAINSET_CLR(*domain, &di->di_remain_mask); } void @@ -221,12 +224,15 @@ vm_domainset_iter_page(struct vm_domainset_iter *di, struct vm_object *obj, if (__predict_false(DOMAINSET_EMPTY(&di->di_valid_mask))) return (ENOMEM); - /* If there are more domains to visit we run the iterator. */ - while (--di->di_n != 0) { + /* If there are more domains to visit in this phase, run the iterator. */ + while (!DOMAINSET_EMPTY(&di->di_remain_mask)) { vm_domainset_iter_next(di, domain); - if (DOMAINSET_ISSET(*domain, &di->di_valid_mask) && - (!di->di_minskip || !vm_page_count_min_domain(*domain))) - return (0); + if (DOMAINSET_ISSET(*domain, &di->di_remain_mask)) { + DOMAINSET_CLR(*domain, &di->di_remain_mask); + if (!di->di_minskip || !vm_page_count_min_domain(*domain)) + return (0); + DOMAINSET_SET(*domain, &di->di_min_mask); + } } /* If we skipped domains below min restart the search. */ @@ -291,12 +297,15 @@ vm_domainset_iter_policy(struct vm_domainset_iter *di, int *domain) if (DOMAINSET_EMPTY(&di->di_valid_mask)) return (ENOMEM); - /* If there are more domains to visit we run the iterator. */ - while (--di->di_n != 0) { + /* If there are more domains to visit in this phase, run the iterator. */ + while (!DOMAINSET_EMPTY(&di->di_remain_mask)) { vm_domainset_iter_next(di, domain); - if (DOMAINSET_ISSET(*domain, &di->di_valid_mask) && - (!di->di_minskip || !vm_page_count_min_domain(*domain))) - return (0); + if (DOMAINSET_ISSET(*domain, &di->di_remain_mask)) { + DOMAINSET_CLR(*domain, &di->di_remain_mask); + if (!di->di_minskip || !vm_page_count_min_domain(*domain)) + return (0); + DOMAINSET_SET(*domain, &di->di_min_mask); + } } /* If we skipped domains below min restart the search. */ diff --git a/sys/vm/vm_domainset.h b/sys/vm/vm_domainset.h index d2cfe362ae78..35fd1679e1c8 100644 --- a/sys/vm/vm_domainset.h +++ b/sys/vm/vm_domainset.h @@ -31,11 +31,15 @@ struct vm_domainset_iter { struct domainset *di_domain; unsigned int *di_iter; + /* Initialized from 'di_domain', initial value after reset. */ domainset_t di_valid_mask; + /* Domains to browse in the current phase. */ + domainset_t di_remain_mask; + /* Domains skipped in phase 1 because under 'v_free_min'. */ + domainset_t di_min_mask; vm_pindex_t di_offset; int di_flags; uint16_t di_policy; - domainid_t di_n; bool di_minskip; };