From nobody Fri Dec 22 00:23:10 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Sx7LQ2KrPz54Svr; Fri, 22 Dec 2023 00:23:10 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Sx7LQ1fp7z3CQp; Fri, 22 Dec 2023 00:23:10 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1703204590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=TQ8dDjCYust2gnhLZz+eAaXrei3pyWvXfAPfwsuidZk=; b=NL9pEe9MOn4lJv1Fh6j9ojwZKpL6HITOwS80E4XJkc7nrbRKAoxZAGl+MRcq9dw3imAjaS vHgU0YtnRGocJaj9LAwat39cScMmP3IJkPb5GQDcRH2zt+u6VccVnN1FmM+9wwd/Xh/xD9 MYD9urg8+Nk5FvmYRieo3qJxnu3bkDsc454L2ayHsirGZvxMAqykLWRS+gRYrXRwTC9nII WDJuNauVGVLHHu45fbI2X2z5cODZxwkqX2f26qOMtF4ioGZVxdDm1bwKEiUd76m9PoQ6xn F5N4fC+tk5YJBaCL5kr+rL8JzQXa/qIPvpzo56I8eOax4DEuQaTxG71QpKanGg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1703204590; a=rsa-sha256; cv=none; b=eAnSXPVuGKYc99XRi/3Ar0tjfD5Q02Q/wNUiZMXAeEGLOFAlJMpc64UbJisvoL6gZe5uC3 ++X/02FeKQicVAiLBp3ueWd5xXRUAHYz6199lapTceaJSol1unbZDgMkKmZyb8Ow9/RzT3 ia2nD8v7QJLvsLrPmS0y9ofcFX61TqROFxDmHBupovq9nsHPg0MfpiuV6+TkQTgbwtmRlc a7UXT5yDQClEJNQuVJ4w37enzCbGPicACDfeDygqPcHiqERJ0baTTS07JaVwPIICdSXXVQ tpZRsOrm8RZcGrnCOaCzZyQf8FrCbEVezdf6ci++qtAPGtkcOz0BAiU+XFKBFg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1703204590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=TQ8dDjCYust2gnhLZz+eAaXrei3pyWvXfAPfwsuidZk=; b=GUaGf0D8rlM2uILvG52lhMyw+ehekVnOYe+Kouovf1wKBchhq28DE/NbnUaNJS3bi+Wuza kmmeABWoYoqeRmbNb6Si+oosJdTWpYCNk7ZYvwVsCnLw7t/fpUIee35TvFXW/HAACTnqga +oV2FOJ1OtQHDs57Fr6ijQtlDF4nP6SO73ErGX8s3now46px3+mbIFp3et3F6OlGO6cpU8 MVxFqgrbB9gLqH3eFVZ0KVsjY3ekOmVFdkp/vU5W7GGw6QJ0vV30qCjJtCylY92DZNHHtN AUoSbqE1ma4MZ+L2z3ZY7/kWUsHzH9ft89AoBg8zBRNzU4QQYSSTR0O/MgwWIg== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Sx7LQ0j6bzvKZ; Fri, 22 Dec 2023 00:23:10 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 3BM0NAom052789; Fri, 22 Dec 2023 00:23:10 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 3BM0NAKs052786; Fri, 22 Dec 2023 00:23:10 GMT (envelope-from git) Date: Fri, 22 Dec 2023 00:23:10 GMT Message-Id: <202312220023.3BM0NAKs052786@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: "Bjoern A. Zeeb" Subject: git: 488e8a7faca5 - main - LinuxKPI: reduce impact of large MAXCPU List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: bz X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 488e8a7faca51a71987fbf00cd36cfcd19269db7 Auto-Submitted: auto-generated The branch main has been updated by bz: URL: https://cgit.FreeBSD.org/src/commit/?id=488e8a7faca51a71987fbf00cd36cfcd19269db7 commit 488e8a7faca51a71987fbf00cd36cfcd19269db7 Author: Bjoern A. Zeeb AuthorDate: 2023-10-23 23:14:35 +0000 Commit: Bjoern A. Zeeb CommitDate: 2023-12-22 00:22:04 +0000 LinuxKPI: reduce impact of large MAXCPU Start scaling arrays dynamically instead of using MAXCPU, resulting in extra allocations on startup but reducing the overall memory footprint. For the static single CPU mask we provide two versions to further save memory depending on a low or high CPU count system. The threshold to switch is currently at 128 CPUs on 64bit platforms. More detailed comments on the implementations can be found in the code. If I am not wrong on a MAXCPU=65536 system the memory footprint should roughly go down from 512M to 1.5M for the static single CPU mask. Submitted by: olce (most of this final version) Sponsored by: The FreeBSD Foundation PR: 274316 Differential Revision: https://reviews.freebsd.org/D42345 --- sys/compat/linuxkpi/common/include/asm/processor.h | 2 +- sys/compat/linuxkpi/common/src/linux_compat.c | 106 +++++++++++++++++++-- 2 files changed, 99 insertions(+), 9 deletions(-) diff --git a/sys/compat/linuxkpi/common/include/asm/processor.h b/sys/compat/linuxkpi/common/include/asm/processor.h index 9e784396c63a..c55238d33505 100644 --- a/sys/compat/linuxkpi/common/include/asm/processor.h +++ b/sys/compat/linuxkpi/common/include/asm/processor.h @@ -41,7 +41,7 @@ struct cpuinfo_x86 { }; extern struct cpuinfo_x86 boot_cpu_data; -extern struct cpuinfo_x86 __cpu_data[]; +extern struct cpuinfo_x86 *__cpu_data; #define cpu_data(cpu) __cpu_data[cpu] #endif diff --git a/sys/compat/linuxkpi/common/src/linux_compat.c b/sys/compat/linuxkpi/common/src/linux_compat.c index a493dc2538ec..36eac309094f 100644 --- a/sys/compat/linuxkpi/common/src/linux_compat.c +++ b/sys/compat/linuxkpi/common/src/linux_compat.c @@ -131,7 +131,8 @@ static void linux_cdev_deref(struct linux_cdev *ldev); static struct vm_area_struct *linux_cdev_handle_find(void *handle); cpumask_t cpu_online_mask; -static cpumask_t static_single_cpu_mask[MAXCPU]; +static cpumask_t **static_single_cpu_mask; +static cpumask_t *static_single_cpu_mask_lcs; struct kobject linux_class_root; struct device linux_root_device; struct class linux_class_misc; @@ -2569,17 +2570,19 @@ io_mapping_create_wc(resource_size_t base, unsigned long size) #if defined(__i386__) || defined(__amd64__) bool linux_cpu_has_clflush; struct cpuinfo_x86 boot_cpu_data; -struct cpuinfo_x86 __cpu_data[MAXCPU]; +struct cpuinfo_x86 *__cpu_data; #endif cpumask_t * lkpi_get_static_single_cpu_mask(int cpuid) { - KASSERT((cpuid >= 0 && cpuid < MAXCPU), ("%s: invalid cpuid %d\n", + KASSERT((cpuid >= 0 && cpuid <= mp_maxid), ("%s: invalid cpuid %d\n", + __func__, cpuid)); + KASSERT(!CPU_ABSENT(cpuid), ("%s: cpu with cpuid %d is absent\n", __func__, cpuid)); - return (&static_single_cpu_mask[cpuid]); + return (static_single_cpu_mask[cpuid]); } static void @@ -2595,7 +2598,9 @@ linux_compat_init(void *arg) boot_cpu_data.x86 = CPUID_TO_FAMILY(cpu_id); boot_cpu_data.x86_model = CPUID_TO_MODEL(cpu_id); - for (i = 0; i < MAXCPU; i++) { + __cpu_data = mallocarray(mp_maxid + 1, + sizeof(*__cpu_data), M_KMALLOC, M_WAITOK | M_ZERO); + CPU_FOREACH(i) { __cpu_data[i].x86_clflush_size = cpu_clflush_line_size; __cpu_data[i].x86_max_cores = mp_ncpus; __cpu_data[i].x86 = CPUID_TO_FAMILY(cpu_id); @@ -2630,13 +2635,92 @@ linux_compat_init(void *arg) CPU_COPY(&all_cpus, &cpu_online_mask); /* * Generate a single-CPU cpumask_t for each CPU (possibly) in the system. - * CPUs are indexed from 0..(MAXCPU-1). The entry for cpuid 0 will only + * CPUs are indexed from 0..(mp_maxid). The entry for cpuid 0 will only * have itself in the cpumask, cupid 1 only itself on entry 1, and so on. * This is used by cpumask_of() (and possibly others in the future) for, * e.g., drivers to pass hints to irq_set_affinity_hint(). */ - for (i = 0; i < MAXCPU; i++) - CPU_SET(i, &static_single_cpu_mask[i]); + static_single_cpu_mask = mallocarray(mp_maxid + 1, + sizeof(static_single_cpu_mask), M_KMALLOC, M_WAITOK | M_ZERO); + + /* + * When the number of CPUs reach a threshold, we start to save memory + * given the sets are static by overlapping those having their single + * bit set at same position in a bitset word. Asymptotically, this + * regular scheme is in O(n²) whereas the overlapping one is in O(n) + * only with n being the maximum number of CPUs, so the gain will become + * huge quite quickly. The threshold for 64-bit architectures is 128 + * CPUs. + */ + if (mp_ncpus < (2 * _BITSET_BITS)) { + cpumask_t *sscm_ptr; + + /* + * This represents 'mp_ncpus * __bitset_words(CPU_SETSIZE) * + * (_BITSET_BITS / 8)' bytes (for comparison with the + * overlapping scheme). + */ + static_single_cpu_mask_lcs = mallocarray(mp_ncpus, + sizeof(*static_single_cpu_mask_lcs), + M_KMALLOC, M_WAITOK | M_ZERO); + + sscm_ptr = static_single_cpu_mask_lcs; + CPU_FOREACH(i) { + static_single_cpu_mask[i] = sscm_ptr++; + CPU_SET(i, static_single_cpu_mask[i]); + } + } else { + /* Pointer to a bitset word. */ + __typeof(((cpuset_t *)NULL)->__bits[0]) *bwp; + + /* + * Allocate memory for (static) spans of 'cpumask_t' ('cpuset_t' + * really) with a single bit set that can be reused for all + * single CPU masks by making them start at different offsets. + * We need '__bitset_words(CPU_SETSIZE) - 1' bitset words before + * the word having its single bit set, and the same amount + * after. + */ + static_single_cpu_mask_lcs = mallocarray(_BITSET_BITS, + (2 * __bitset_words(CPU_SETSIZE) - 1) * (_BITSET_BITS / 8), + M_KMALLOC, M_WAITOK | M_ZERO); + + /* + * We rely below on cpuset_t and the bitset generic + * implementation assigning words in the '__bits' array in the + * same order of bits (i.e., little-endian ordering, not to be + * confused with machine endianness, which concerns bits in + * words and other integers). This is an imperfect test, but it + * will detect a change to big-endian ordering. + */ + _Static_assert( + __bitset_word(_BITSET_BITS + 1, _BITSET_BITS) == 1, + "Assumes a bitset implementation that is little-endian " + "on its words"); + + /* Initialize the single bit of each static span. */ + bwp = (__typeof(bwp))static_single_cpu_mask_lcs + + (__bitset_words(CPU_SETSIZE) - 1); + for (i = 0; i < _BITSET_BITS; i++) { + CPU_SET(i, (cpuset_t *)bwp); + bwp += (2 * __bitset_words(CPU_SETSIZE) - 1); + } + + /* + * Finally set all CPU masks to the proper word in their + * relevant span. + */ + CPU_FOREACH(i) { + bwp = (__typeof(bwp))static_single_cpu_mask_lcs; + /* Find the non-zero word of the relevant span. */ + bwp += (2 * __bitset_words(CPU_SETSIZE) - 1) * + (i % _BITSET_BITS) + + __bitset_words(CPU_SETSIZE) - 1; + /* Shift to find the CPU mask start. */ + bwp -= (i / _BITSET_BITS); + static_single_cpu_mask[i] = (cpuset_t *)bwp; + } + } strlcpy(init_uts_ns.name.release, osrelease, sizeof(init_uts_ns.name.release)); } @@ -2649,6 +2733,12 @@ linux_compat_uninit(void *arg) linux_kobject_kfree_name(&linux_root_device.kobj); linux_kobject_kfree_name(&linux_class_misc.kobj); + free(static_single_cpu_mask_lcs, M_KMALLOC); + free(static_single_cpu_mask, M_KMALLOC); +#if defined(__i386__) || defined(__amd64__) + free(__cpu_data, M_KMALLOC); +#endif + mtx_destroy(&vmmaplock); spin_lock_destroy(&pci_lock); rw_destroy(&linux_vma_lock);