Date: Fri, 31 Mar 2023 12:43:10 -0700 (PDT) From: Jeff Roberson <jroberson@jroberson.net> To: freebsd-hackers@freebsd.org Subject: ULE process to resolution Message-ID: <a6066590-0b4d-b332-102a-9c2432cdfec6@jroberson.net>
next in thread | raw e-mail | index | archive | help
Hi Folks, For those who don't know, I am the original author of ULE. I have not had much time for FreeBSD in recent years but this thread was forwarded to me and I am dishearetened at the state of things. I will give my perspective and propose a path to resolve this systematically. The fundamental benefit of ULE is also the fundamental challenge, That is: N cpu local decisions need to add up to a reasonable approximation of a correct global decision. This is necessary to scale to large core counts, large thread counts, and preserve some affinity. You could permute 4BSD further towards these goals but I posit that you would simply have to work through the same bugs. As I read these threads I can state with a high degree of confidence that many of these tests worked with superior results with ULE at one time. It may be that tradeoffs have changed or exposed weaknesses, it may also be that it's simply been broken over time. I see a large number of commits intended to address point issues and wonder whether we adequately explored the consquences. Indeed I see solutions involving tunables proposed here that will definitively break other cases. I know that CPU tradeoffs have changed. ULE was written in a way that the topology could be annotated and cost of migration can be specified. It is adaptable to this but someone has to put in the effort. The cost function was written in ticks which does not scale down properly and accurate cpu tick counters could now be used for more precise time-keeping for more specific affinity. Over time people have also added additional searches to pickcpu which don't scale well to very high core count systems. NUMA and heterogeneous CPUs are also possible in the graph framework but need further investment. The other thing that has changed over time is the ability of the interactivity score to correctly detect truely interactive applications. When I wrote it you could do a buildworld on a single core or small multi-core system and play mp3s and browse the web without a hiccup. However, web browsers have evolved to be significantly more resource intensive. I'm not sure a heuristic can or should catch this case. We're probably long overdue to add x window focus hints as most other operating systems do. I don't think tossing the interactivity score is really going to produce the desired results. Linux CFS disagrees with me but I have always been able to achieve superior responsiveness with ULE. My intuition is that with an x window focus hint we could dial back the interactive threshold and have better tradeoffs with the soft real-time score. schedgraph is also no longer adequate for modern systems. In my professional life I have taken the same types of data sources and built text based processes on top because graphical representations just can't scale to the number of events and cores for full system scheduling. For complex scheduling issues you need detailed introspection. You're not going to tweak variables and run buildworlds to arrive at success by supposition with any kind of reasonable velocity. The first step to resolving this is to come up with a list of regression tests and catalog how they behave compared to 4BSD. When I wrote the scheduler I also wrote a simple fixed duty cycle program that could be run with different scheduling parameters and report on its cpu usage and latency. Combining many copies of this program you can simulate various kinds of interactions. It is available at people.freebsd.org/~jeff/late.tgz. I know there is also a linux scheduler benchmark that may be worth porting. If someone would start making regression tests I am happy to fix bugs or review bug fixes. Personally I would start from fairness given different nice values on a single CPU, and then multi-cpu. Evaluate allocation with variation on load to core count ratios. It should not take a few hours to iterate through the interesting cases here before going on to more complex questions about buildworld or firefox etc. This would need to be something we carried forward in the source tree and ask people to re-run as part of scheduler CRs or we're just going to find ourselves back in this spot again. I also have a backlog of improvements for large multi-core systems from work I did years ago that have not made it into the tree. And I have an old review for patches to improve the reliability of priority in causing scheduling events that may be germane. If we can collaborate on a testing framework I could trickle these in. Thanks, Jeff
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a6066590-0b4d-b332-102a-9c2432cdfec6>