From nobody Wed Jan 14 23:34:50 2026 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4ds2XT3RmBz6P4LB for ; Wed, 14 Jan 2026 23:35:05 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from www121.sakura.ne.jp (www121.sakura.ne.jp [153.125.133.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4ds2XS4kmpz3QdP; Wed, 14 Jan 2026 23:35:03 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Authentication-Results: mx1.freebsd.org; none Received: from delta.joker.local (124-18-6-240.area1c.commufa.jp [124.18.6.240]) (authenticated bits=0) by www121.sakura.ne.jp (8.18.1/8.17.1/[SAKURA-WEB]/20201212) with ESMTPA id 60ENYo3G049201; Thu, 15 Jan 2026 08:34:53 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dec.sakura.ne.jp; s=s2405; t=1768433693; bh=HWvGFKQUgT8kNw8RQl5xChnogB/ZiRanLwjIPO3oCXA=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=m1O+yHjFFxOCpGvQ9DscaEZ2LJlFRlzZHVZj04xwP9w/NjOyJhcmJuYrp6zzNNxv9 ydM1cmiE1yk0vskD/ZLcq3gw7Ax0mx8fOgvqbv5lBbUaWZAW9D+v4Wt1aSC2mO5jAd jCPQP2+9VulsBd8WBPxHTIvJoW1vnAN30933EQx4= Date: Thu, 15 Jan 2026 08:34:50 +0900 From: Tomoaki AOKI To: Olivier Certner Cc: Minsoo Choo , freebsd-hackers Subject: Re: HMP scheduling on FreeBSD Message-Id: <20260115083450.f20c13f24d2ebbc68db9cd01@dec.sakura.ne.jp> In-Reply-To: <1886427.OVFmXjEfDW@ravel> References: <0Ng09S3rEB0BvT9vzHqVKU7rWxoad96kjEc7U2LCwDFJKmmswXujip7qbRlo_BIhNKcI7d-2CUHdp9Dxr3-7hhafpD6uagJSFUCjtC9qRr4=@proton.me> <1886427.OVFmXjEfDW@ravel> Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd15.0) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:7684, ipnet:153.125.128.0/18, country:JP] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 4ds2XS4kmpz3QdP On Wed, 14 Jan 2026 23:14:52 +0100 Olivier Certner wrote: > Hi Minsoo, > > > For the last few days, I've been working on scheduler optimization for heterogeneous cores ("HMP scheduling" from now on) on FreeBSD. > > That's great! I've also been working on it, albeit in a slow fashion and mostly in the background, rather focusing on scheduler design and integration on our cpusets. > > Giving quickly some first comments. > > > The first component of HMP scheduling is "cpucap". One issue with HMP scheduling is that identifying the capacity and scores of a processor (i.e. providers) is machine-dependent while the scheduler code should be machine-independent, so cpucap acts as an interface between the scheduler and providers. CPU capacity and scores are stored in pcpu structure while the machine's cpucap status (e.g. initialized, has dynamic scores, etc) is stored in global cpucap structure of type "cpucap_t". It also includes functions for scheduler and providers, such as accessors, setters, finding "best" cpu, etc. The review (D54674) adds these facilities under HMP option. > > I'll review D54674, but not in the immediate. Hopefully next week. > > > By dividing a core's capacity by total capacity, we can assign an equal fraction of tasks to the core's run queue. > > > > On the other hand, scores reflect the real time status of a processor (snip). For example, if a performance core is experiencing throttling, its score could go down to 1000. In that case, the scheduler will prefer core that has the highest score. > > These are good first observations but they can only really apply in specific circumstances. Converting core's capacity in run queue length can only drive a loaded system, not a mostly idle one. This mechanism will also cause an increase in latency for threads running on performant cores. > > There are several theoretical considerations that should be met *together*, such as fairness, latency, bias to performance or to energy (policy), affinity, cpusets (directives), etc., and... > > > Before integrating scheduler and cpucap, I need to go through sched_ule.c​ from top to bottom. After that, I'll add new functions or drop existing ones from the cpucap framework then work on the integration. > > ...there are some practical considerations too. ULE maintains per-CPU run queues and does inter-CPU thread exchange relatively infrequently (through the so-called "long-term long balancer") for fairness. It will not exchange two threads on two different cores if there are the only ones running, which again is unfair if the two cores have different performances. > > A general scheduler must cater to a variety of workloads, and it can be quite difficult to improve some characteristics without degrading others. We certainly don't want to rush things. > > I invite you to read the https://wiki.freebsd.org/Scheduler/Hybrid for a glimpse on some of the trade-offs involved and a wider perspective, which however is by no means complete and for which input from you and any other interested parties is welcome. > > Thanks and regards. > > -- > Olivier Certner Hi. Not yet read diffs and existing codes, sorry if it's already done or known not works effectively. Just an idea, if existing schedulers are already NUMA aware, adding another layer describing the attributes of cores as leaves of each NUMA domain could help. This is because (AFAIK) single NUMA domain could have different types of cores. Regards. -- Tomoaki AOKI