From nobody Thu Mar 23 11:01:44 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Pj2Tq4JQgz41CTF for ; Thu, 23 Mar 2023 11:01:51 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) by mx1.freebsd.org (Postfix) with ESMTP id 4Pj2Tq1gpyz47lZ; Thu, 23 Mar 2023 11:01:51 +0000 (UTC) (envelope-from rb@gid.co.uk) Authentication-Results: mx1.freebsd.org; none Received: from smtpclient.apple (moriarty.gid.co.uk [194.32.164.17]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id 32NB1iv6006830; Thu, 23 Mar 2023 11:01:44 GMT (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=utf-8 List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.2\)) Subject: Re: Periodic rant about SCHED_ULE From: Bob Bishop In-Reply-To: Date: Thu, 23 Mar 2023 11:01:44 +0000 Cc: "freebsd-hackers@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <5A2888BF-AB1F-40E0-AF42-ECE8D32AD908@gid.co.uk> References: <8173cc7e-e934-dd5c-312a-1dfa886941aa@FreeBSD.org> <8cfdb951-9b1f-ecd3-2291-7a528e1b042c@m5p.com> <7f26102c-7542-78f8-0c7b-ef3cdaa1a4a6@FreeBSD.org> To: David Chisnall X-Mailer: Apple Mail (2.3696.120.41.1.2) X-Rspamd-Queue-Id: 4Pj2Tq1gpyz47lZ X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:42831, ipnet:194.32.164.0/24, country:GB] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N Hi, On 23 Mar 2023, at 10:07, David Chisnall wrote: >=20 > On 22/03/2023 20:15, Stefan Esser wrote: >> Better balancing of the load would probably make ULE take less real >> time. The example of 9 identical tasks on 8 cores with 7 tasks = getting >> 100% of a core and the other 2 sharing a core and getting 50% each >> could be resolved by moving a CPU bound process from the CPU with the >> highest load to a random CPU (probably not the one with the lowest = load >> or limited to the same cluster or NUMA domain, since then it would = stay >> in a subset of the available cores). >=20 > Two things have changed in CPUs since ULE was written that make the = affinity less of a win and may make some low-frequency random = rebalancing better: >=20 > Snopping from another core's L1 is a lot cheaper (less true on = multi-socket systems, but fortunately ULE is NUMA-aware and so can = factor this in), which makes the cost of migrating a thread to another = core much cheaper (there are still kernel synchronisation costs, but the = cost of running on a core that doesn't have a warm cache is lower: the = caches warm very quickly). >=20 > CPUs now have a lot more power domains. If one core is doing a lot = more work than others then there's a good chance that it will be = thermally throttled but others may not if they're in a separate power / = thermal domain. This means that keeping a compute-bound process on the = same core is the worst thing that you can do if other cores are idle: = that core may be throttled back to <2 GHz whereas a core on the other = side of the chip may be able to run at >3 GHz. Evenly heating the = entire CPU can have give much better performance if the number of active = threads is less than the number of running cores and better fairness in = other cases. >=20 > Both ULE and 4BSD are unaware of the heterogeneity of modern CPUs, = which often have 2-3 different kinds of core that run at different = speeds and neither understands a concept of a power budget, so there's a = lot of potential improvement here. Writing a bad (but working) = scheduler is a fairly difficult task, writing a good one is much harder, = so I'm not volunteering to do it, but if someone is interested then it = would probably be a good candidate for Foundation funding. I've heard = good things about the XNU scheduler recently, that might be a good = source of inspiration. >=20 > David >=20 This is spot on as a summary of the landscape. The MacOS scheduler = (based on XNU) [1] seems to do a pretty good job with heterogeneous = cores vs power management, and MacOS has APIs allowing applications to = take account of the thermal state of the total system[2]. But, I = haven=E2=80=99t seen any references to fine-grained thermal management = as outlined above. [1] = https://developer.apple.com/library/archive/documentation/Darwin/Conceptua= l/KernelProgramming/scheduler/scheduler.html [2] = https://developer.apple.com/library/archive/documentation/Performance/Conc= eptual/power_efficiency_guidelines_osx/RespondToThermalStateChanges.html -- Bob Bishop rb@gid.co.uk