Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Mar 2023 10:07:19 +0000
From:      David Chisnall <theraven@FreeBSD.org>
To:        freebsd-hackers@freebsd.org
Subject:   Re: Periodic rant about SCHED_ULE
Message-ID:  <a61f759e-9aea-d77f-6d5e-cecafdfe60b3@FreeBSD.org>
In-Reply-To: <7f26102c-7542-78f8-0c7b-ef3cdaa1a4a6@FreeBSD.org>
References:  <a401e51a-250a-64a0-15cb-ff79bcefbf94@m5p.com> <8173cc7e-e934-dd5c-312a-1dfa886941aa@FreeBSD.org> <8cfdb951-9b1f-ecd3-2291-7a528e1b042c@m5p.com> <c3f5f667-ba0b-c40c-b8a6-19d1c9c63c5f@FreeBSD.org> <ZBtRJhNHluj5Nzyk@troutmask.apl.washington.edu> <CAGudoHEj%2BkoaYhkjzDE5KX9OsCno=X5M_E3z9uwg6Pg7dtqTsA@mail.gmail.com> <7f26102c-7542-78f8-0c7b-ef3cdaa1a4a6@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 22/03/2023 20:15, Stefan Esser wrote:
> Better balancing of the load would probably make ULE take less real
> time. The example of 9 identical tasks on 8 cores with 7 tasks getting
> 100% of a core and the other 2 sharing a core and getting 50% each
> could be resolved by moving a CPU bound process from the CPU with the
> highest load to a random CPU (probably not the one with the lowest load
> or limited to the same cluster or NUMA domain, since then it would stay
> in a subset of the available cores).

Two things have changed in CPUs since ULE was written that make the 
affinity less of a win and may make some low-frequency random 
rebalancing better:

Snopping from another core's L1 is a lot cheaper (less true on 
multi-socket systems, but fortunately ULE is NUMA-aware and so can 
factor this in), which makes the cost of migrating a thread to another 
core much cheaper (there are still kernel synchronisation costs, but the 
cost of running on a core that doesn't have a warm cache is lower: the 
caches warm very quickly).

CPUs now have a lot more power domains.  If one core is doing a lot more 
work than others then there's a good chance that it will be thermally 
throttled but others may not if they're in a separate power / thermal 
domain.  This means that keeping a compute-bound process on the same 
core is the worst thing that you can do if other cores are idle: that 
core may be throttled back to <2 GHz whereas a core on the other side of 
the chip may be able to run at >3 GHz.  Evenly heating the entire CPU 
can have give much better performance if the number of active threads is 
less than the number of running cores and better fairness in other cases.

Both ULE and 4BSD are unaware of the heterogeneity of modern CPUs, which 
often have 2-3 different kinds of core that run at different speeds and 
neither understands a concept of a power budget, so there's a lot of 
potential improvement here.  Writing a bad (but working) scheduler is a 
fairly difficult task, writing a good one is much harder, so I'm not 
volunteering to do it, but if someone is interested then it would 
probably be a good candidate for Foundation funding.  I've heard good 
things about the XNU scheduler recently, that might be a good source of 
inspiration.

David



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a61f759e-9aea-d77f-6d5e-cecafdfe60b3>