From nobody Thu Jan 15 11:31:57 2026 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4dsLRt5xpgz6NpxR for ; Thu, 15 Jan 2026 11:32:10 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R12" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4dsLRt5RDPz3Y2N; Thu, 15 Jan 2026 11:32:10 +0000 (UTC) (envelope-from theraven@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1768476730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=soHpxVymYea2J7ylmbAqxk/6woXs+PEe+r1pOho8yCQ=; b=gx79wpu4VhsMki9OBoH8zroyfA/ZFIlnYI2g1CEOCmzfwj2suWzGbxul2kPZguFRtmrIHP OLSQE/g5+04jSrL/L6Ifa1gpLbl2LfELHRZBuqNzo3G/e07AnB2ipFrzU8peyXccP++Cy1 DleAnVPaDXW2MwVEmpvK5gkUI1NXm6VfmrSEjiF1kjMo9J4AvW4iA3dAGn9/SBoT7NmzRn D+k+Ht7V/X5zbJMW/mNA7ZS+4YQV4KEfBvBA1RGC+JntJPpHyCc1tMAxby4ys6lkRBxFnV il7f1Tmm8suUIrWxg9BtTaDMn5D/Ib9nWHjUzfRMSPheWutVt4WtFgNSYGsEtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1768476730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=soHpxVymYea2J7ylmbAqxk/6woXs+PEe+r1pOho8yCQ=; b=V9RPSLzdK9ehK3MRFtPBmTVTMOnDFdsvNs91fhs0u+TIVIFG7xE6Piwjcv3hIco+FTkg2R NjHZI7C2+2kBxObVkVTS5krcnq9pTRI3KKNXLWUpkgMMHj9Ow0GoWGejRi93Buquo6MBN0 JS34xvSHiwhulSBMbVyYzZNYXd0Nz7/jeq3Yvl90U08DOcaoBhDKtrj4HYt42mAyL9/izq MPF2EZarolunDAxATXIFgTICf9gqs1d1DmHwr/ktZGybLuxNiok2LWUYV5+9sEMMOmY0oA Bmokm8J0bEjAu+sB7fsY0D1KlLikuhpzF3PpkAjxkWj7vIef/YO1Hth5qQm0BQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1768476730; a=rsa-sha256; cv=none; b=KAAeGWOuVgzkHSfqz2IbIRffpbwkkuZPcSgVeb1ux0v1qsyTf+XbIfJTFcamsQOKsnxwTA XaRNDQ1lz3YXkoL9n3d5yFldTBSQToaemwEArY7kvLiTheNcMW4JyGgp+Q8ZDUsuMrVU9U WRuV5iHP9Yli1F/8gOx2b4xGJmFrAjbfDO2MwVXZNxjLIj/P2JVWdMmfFt0eqsvRDU/HoX ZL46DRK+ExBgF4NIh0JaDsVXe/QZkVREmy141jIa9KwpD0Kt1HVD2/TzTNY9VK0txgkVq0 /VKDArZ3kVDtnh1TpvHx4kMLerpPXAR9yoR4LQvMKvvA0aZoheMs4Ys0WjMgzQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from smtp.theravensnest.org (smtp.theravensnest.org [45.77.103.195]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: theraven) by smtp.freebsd.org (Postfix) with ESMTPSA id 4dsLRt4jhDzCKx; Thu, 15 Jan 2026 11:32:10 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from smtpclient.apple (host86-143-41-230.range86-143.btcentralplus.com [86.143.41.230]) by smtp.theravensnest.org (Postfix) with ESMTPSA id 0A8B5113E8; Thu, 15 Jan 2026 11:32:08 +0000 (GMT) Content-Type: text/plain; charset=utf-8 List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.300.41.1.7\)) Subject: Re: HMP scheduling on FreeBSD From: David Chisnall In-Reply-To: <1886427.OVFmXjEfDW@ravel> Date: Thu, 15 Jan 2026 11:31:57 +0000 Cc: Minsoo Choo , freebsd-hackers Content-Transfer-Encoding: quoted-printable Message-Id: References: <0Ng09S3rEB0BvT9vzHqVKU7rWxoad96kjEc7U2LCwDFJKmmswXujip7qbRlo_BIhNKcI7d-2CUHdp9Dxr3-7hhafpD6uagJSFUCjtC9qRr4=@proton.me> <1886427.OVFmXjEfDW@ravel> To: Olivier Certner X-Mailer: Apple Mail (2.3864.300.41.1.7) On 14 Jan 2026, at 22:14, Olivier Certner wrote: >=20 > These are good first observations but they can only really apply in = specific circumstances. Converting core's capacity in run queue length = can only drive a loaded system, not a mostly idle one. This mechanism = will also cause an increase in latency for threads running on performant = cores. There are also some fun corner cases. For example, the first generation = big.LITTLE systems typically used Cortex A53 and A57 cores. The A57 was = *much* faster, but it had four-cycle access to the L1 cache, whereas the = A53 had single-cycle access. Workloads that fitted in L1 were faster on = the A53. So this can be a core x workload (or *phase of workload*) = metric. That said, treating it as a per-core metric is probably fine = unless you want to hook in performance counters and do dynamic = measurement. > There are several theoretical considerations that should be met = *together*, such as fairness, latency, bias to performance or to energy = (policy), affinity, cpusets (directives), etc., and... The hot-plug aspect is also important. The best energy efficiency comes = from turning the CPU off entirely. Power-aware schedulers want to have = a strategy for turning cores off in the way that minimises *total* = system power consumption. This is tricky for a few reasons: - There=E2=80=99s a tradeoff between running a workload for a long time = on a slow core or running it for a long time or on a fast core for a = long time. The heuristics that ULE collects to identify I/O-bound vs = CPU-bound workloads are a starting point, but you also likely need to = track the typical sleeping time. If a workload sleeps for long enough = that it=E2=80=99s worth turning a big core off (or into a deep low-power = state), that wants a very different scheduling policy to one that=E2=80=99= s sitting using 5% of a core most of the time. - Some systems have independent ability to shut off cores and their = caches. This has some interesting effects because snooping on another = core=E2=80=99s cache is usually faster than going out to main memory, so = sleeping a core but not its cache may improve performance of nearby = cores (by a NUMA-dependent amount). Similarly, shutting down another = core=E2=80=99s caches may reduce performance of nearby ones (note: This = usually doesn=E2=80=99t apply for fully inclusive caches, but most CPU = vendors have been moving away from those). Apple did a couple of things to support this kind of tuning. The first = was to add a slack parameter into the kqueue timeouts. This let the = scheduler coalesce wakeups. For example, if you have a clock that=E2=80=99= s running once a second to update the second tick, and you have a bunch = of other things mostly sleeping and waking up once per second then = it=E2=80=99s useful to align all of the others with the clock app=E2=80=99= s wake so that you can turn on a high-performance core, wake up, and = then sleep. This is useful even on homogeneous SMP systems and would be = a really good *first* step for this kind of work. The second was to provide explicit hints to allow threads to indicate = the kinds of cores that they want to run on. All of which is to say that I=E2=80=99m not sure that starting from ULE = is necessarily a good strategy, since it wasn=E2=80=99t designed with = any of these constraints in mind. Oh, and Apple isn=E2=80=99t perfect. Their scheduler currently has a = bunch of issues with systems that distribute work across threads, where = the overall performance depends on the throughput of the slowest one. = For longer-running threads, they=E2=80=99ll interleave P and E cores, so = you need to do fairly fine-grained work stealing to use their scheduler = efficiently. David=