From nobody Thu Jan 15 11:31:57 2026
X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4dsLRt5xpgz6NpxR
	for <freebsd-hackers@mlmmj.nyi.freebsd.org>; Thu, 15 Jan 2026 11:32:10 +0000 (UTC)
	(envelope-from theraven@FreeBSD.org)
Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (4096 bits) client-digest SHA256)
	(Client CN "smtp.freebsd.org", Issuer "R12" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4dsLRt5RDPz3Y2N;
	Thu, 15 Jan 2026 11:32:10 +0000 (UTC)
	(envelope-from theraven@FreeBSD.org)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim;
	t=1768476730;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=soHpxVymYea2J7ylmbAqxk/6woXs+PEe+r1pOho8yCQ=;
	b=gx79wpu4VhsMki9OBoH8zroyfA/ZFIlnYI2g1CEOCmzfwj2suWzGbxul2kPZguFRtmrIHP
	OLSQE/g5+04jSrL/L6Ifa1gpLbl2LfELHRZBuqNzo3G/e07AnB2ipFrzU8peyXccP++Cy1
	DleAnVPaDXW2MwVEmpvK5gkUI1NXm6VfmrSEjiF1kjMo9J4AvW4iA3dAGn9/SBoT7NmzRn
	D+k+Ht7V/X5zbJMW/mNA7ZS+4YQV4KEfBvBA1RGC+JntJPpHyCc1tMAxby4ys6lkRBxFnV
	il7f1Tmm8suUIrWxg9BtTaDMn5D/Ib9nWHjUzfRMSPheWutVt4WtFgNSYGsEtQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org;
	s=dkim; t=1768476730;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=soHpxVymYea2J7ylmbAqxk/6woXs+PEe+r1pOho8yCQ=;
	b=V9RPSLzdK9ehK3MRFtPBmTVTMOnDFdsvNs91fhs0u+TIVIFG7xE6Piwjcv3hIco+FTkg2R
	NjHZI7C2+2kBxObVkVTS5krcnq9pTRI3KKNXLWUpkgMMHj9Ow0GoWGejRi93Buquo6MBN0
	JS34xvSHiwhulSBMbVyYzZNYXd0Nz7/jeq3Yvl90U08DOcaoBhDKtrj4HYt42mAyL9/izq
	MPF2EZarolunDAxATXIFgTICf9gqs1d1DmHwr/ktZGybLuxNiok2LWUYV5+9sEMMOmY0oA
	Bmokm8J0bEjAu+sB7fsY0D1KlLikuhpzF3PpkAjxkWj7vIef/YO1Hth5qQm0BQ==
ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1768476730; a=rsa-sha256; cv=none;
	b=KAAeGWOuVgzkHSfqz2IbIRffpbwkkuZPcSgVeb1ux0v1qsyTf+XbIfJTFcamsQOKsnxwTA
	XaRNDQ1lz3YXkoL9n3d5yFldTBSQToaemwEArY7kvLiTheNcMW4JyGgp+Q8ZDUsuMrVU9U
	WRuV5iHP9Yli1F/8gOx2b4xGJmFrAjbfDO2MwVXZNxjLIj/P2JVWdMmfFt0eqsvRDU/HoX
	ZL46DRK+ExBgF4NIh0JaDsVXe/QZkVREmy141jIa9KwpD0Kt1HVD2/TzTNY9VK0txgkVq0
	/VKDArZ3kVDtnh1TpvHx4kMLerpPXAR9yoR4LQvMKvvA0aZoheMs4Ys0WjMgzQ==
ARC-Authentication-Results: i=1;
	mx1.freebsd.org;
	none
Received: from smtp.theravensnest.org (smtp.theravensnest.org [45.77.103.195])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	(Authenticated sender: theraven)
	by smtp.freebsd.org (Postfix) with ESMTPSA id 4dsLRt4jhDzCKx;
	Thu, 15 Jan 2026 11:32:10 +0000 (UTC)
	(envelope-from theraven@FreeBSD.org)
Received: from smtpclient.apple (host86-143-41-230.range86-143.btcentralplus.com [86.143.41.230])
	by smtp.theravensnest.org (Postfix) with ESMTPSA id 0A8B5113E8;
	Thu, 15 Jan 2026 11:32:08 +0000 (GMT)
Content-Type: text/plain;
	charset=utf-8
List-Id: Technical discussions relating to FreeBSD <freebsd-hackers.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-hackers
List-Help: <mailto:freebsd-hackers+help@freebsd.org>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Subscribe: <mailto:freebsd-hackers+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-hackers+unsubscribe@freebsd.org>
Sender: owner-freebsd-hackers@FreeBSD.org
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.300.41.1.7\))
Subject: Re: HMP scheduling on FreeBSD
From: David Chisnall <theraven@FreeBSD.org>
In-Reply-To: <1886427.OVFmXjEfDW@ravel>
Date: Thu, 15 Jan 2026 11:31:57 +0000
Cc: Minsoo Choo <minsoochoo0122@proton.me>,
 freebsd-hackers <freebsd-hackers@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <A6BBCEE7-B233-4F91-BB4A-7D91A169F09E@FreeBSD.org>
References: <0Ng09S3rEB0BvT9vzHqVKU7rWxoad96kjEc7U2LCwDFJKmmswXujip7qbRlo_BIhNKcI7d-2CUHdp9Dxr3-7hhafpD6uagJSFUCjtC9qRr4=@proton.me>
 <1886427.OVFmXjEfDW@ravel>
To: Olivier Certner <olce@freebsd.org>
X-Mailer: Apple Mail (2.3864.300.41.1.7)

On 14 Jan 2026, at 22:14, Olivier Certner <olce@freebsd.org> wrote:
>=20
> These are good first observations but they can only really apply in =
specific circumstances.  Converting core's capacity in run queue length =
can only drive a loaded system, not a mostly idle one.  This mechanism =
will also cause an increase in latency for threads running on performant =
cores.

There are also some fun corner cases.  For example, the first generation =
big.LITTLE systems typically used Cortex A53 and A57 cores.  The A57 was =
*much* faster, but it had four-cycle access to the L1 cache, whereas the =
A53 had single-cycle access.  Workloads that fitted in L1 were faster on =
the A53.  So this can be a core x workload (or *phase of workload*) =
metric.  That said, treating it as a per-core metric is probably fine =
unless you want to hook in performance counters and do dynamic =
measurement.

> There are several theoretical considerations that should be met =
*together*, such as fairness, latency, bias to performance or to energy =
(policy), affinity, cpusets (directives), etc., and...

The hot-plug aspect is also important.  The best energy efficiency comes =
from turning the CPU off entirely.  Power-aware schedulers want to have =
a strategy for turning cores off in the way that minimises *total* =
system power consumption.  This is tricky for a few reasons:

 - There=E2=80=99s a tradeoff between running a workload for a long time =
on a slow core or running it for a long time or on a fast core for a =
long time.  The heuristics that ULE collects to identify I/O-bound vs =
CPU-bound workloads are a starting point, but you also likely need to =
track the typical sleeping time.  If a workload sleeps for long enough =
that it=E2=80=99s worth turning a big core off (or into a deep low-power =
state), that wants a very different scheduling policy to one that=E2=80=99=
s sitting using 5% of a core most of the time.
 - Some systems have independent ability to shut off cores and their =
caches.  This has some interesting effects because snooping on another =
core=E2=80=99s cache is usually faster than going out to main memory, so =
sleeping a core but not its cache may improve performance of nearby =
cores (by a NUMA-dependent amount).  Similarly, shutting down another =
core=E2=80=99s caches may reduce performance of nearby ones (note: This =
usually doesn=E2=80=99t apply for fully inclusive caches, but most CPU =
vendors have been moving away from those).

Apple did a couple of things to support this kind of tuning.  The first =
was to add a slack parameter into the kqueue timeouts.  This let the =
scheduler coalesce wakeups.  For example, if you have a clock that=E2=80=99=
s running once a second to update the second tick, and you have a bunch =
of other things mostly sleeping and waking up once per second then =
it=E2=80=99s useful to align all of the others with the clock app=E2=80=99=
s wake so that you can turn on a high-performance core, wake up, and =
then sleep.  This is useful even on homogeneous SMP systems and would be =
a really good *first* step for this kind of work.

The second was to provide explicit hints to allow threads to indicate =
the kinds of cores that they want to run on.

All of which is to say that I=E2=80=99m not sure that starting from ULE =
is necessarily a good strategy, since it wasn=E2=80=99t designed with =
any of these constraints in mind.

Oh, and Apple isn=E2=80=99t perfect.  Their scheduler currently has a =
bunch of issues with systems that distribute work across threads, where =
the overall performance depends on the throughput of the slowest one.  =
For longer-running threads, they=E2=80=99ll interleave P and E cores, so =
you need to do fairly fine-grained work stealing to use their scheduler =
efficiently.

David=