From nobody Thu Mar 30 18:50:31 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PnXYN6MW0z42B43 for ; Thu, 30 Mar 2023 18:50:32 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PnXYN4j6xz3HtT for ; Thu, 30 Mar 2023 18:50:32 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ot1-x32e.google.com with SMTP id r17-20020a05683002f100b006a131458abfso7968900ote.2 for ; Thu, 30 Mar 2023 11:50:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680202231; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HfVBgOye/IGioI3xBOIodiweQ4NocpBXIGNVsY/5aKg=; b=A3Jnq3fjQYJMZPcfhPTjVznhp16aVYGWqmCMV7qoqnbfvdAdZLXuggZ3YwAmXiL/uJ nOad6XUCiu4aRqeJirs7reJiIGxwAz1TL8Z1V4dd20fU33rLsd8C2QBRSUp0SeabOr4l JyUSmilBe3En1VsWorH+QGj9/ve7cJkl4HIYBiFVJNFViMmp0Ht9ZH81tkwFpz9FBU/T yfGC+ZZbi1iApcQPB2+xZ0cA6sQ/XZmkfINWu+hEoGqsx0Ts2wkU5BI6Ru9q/rPiSnHE R8+h7Xq51RAZhRTYIh3c7qApp9xv7oyJRrE48iCJhQI7mle5CcdTHwTCoEDyV4sLYD9A 6L2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680202231; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HfVBgOye/IGioI3xBOIodiweQ4NocpBXIGNVsY/5aKg=; b=bBgqgnloHgH5eoiMigRfcgteZjIsy4pcMzH9PG8tY+nJ5q0JOj16knLHN2nz4mmDgL Ot24my0GlTrVe/PGDvg0yIWEwe2QVROxQPxhKABSht5t4fHuD7ZphGyUA4Ns3Cj3n2G1 mOm3dgQk3sgQUP532PF404Uz14G9Fe1TRGbhQWM5Djya7azzsAQ/ZUyeF+azoUsxiyo8 0r+cWiOjUh7rMXFsREzGjKolw+qdpWJDnadJVdNXPCvbhwFJTek9ZM93Twa+l17s7YLX hcYEe6rSqJ94eMkOYl7F/epmMs38zlbAewSKjvGNXzTukNm08znjozT8FhZXSzdDHboD Y34Q== X-Gm-Message-State: AO0yUKXxn6JS/yrTYnasHIVfz1hLI8BmtSHcz4DCyPy7Bm8u7RxDBbDs j1AvVZ5TPUAXPyzd/hI4jAkGdK7VzokgjB+kQvliJMvf X-Google-Smtp-Source: AK7set8grPTQij6aZ2x0lObjOhKxTVcEakMLF/Zxphq5rB517OYEtdzArAg7OGkpkNoRJwksHOFhie4zoqu7tjXUo00= X-Received: by 2002:a9d:63c5:0:b0:698:f988:7c30 with SMTP id e5-20020a9d63c5000000b00698f9887c30mr7928324otl.2.1680202231552; Thu, 30 Mar 2023 11:50:31 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Received: by 2002:ac9:7598:0:b0:49c:b071:b1e3 with HTTP; Thu, 30 Mar 2023 11:50:31 -0700 (PDT) In-Reply-To: References: <8173cc7e-e934-dd5c-312a-1dfa886941aa@FreeBSD.org> <8cfdb951-9b1f-ecd3-2291-7a528e1b042c@m5p.com> From: Mateusz Guzik Date: Thu, 30 Mar 2023 20:50:31 +0200 Message-ID: Subject: Re: Periodic rant about SCHED_ULE To: Kevin Bowling Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4PnXYN4j6xz3HtT X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On 3/30/23, Kevin Bowling wrote: > On Thu, Mar 30, 2023 at 11:29=E2=80=AFAM Kevin Bowling > wrote: >> >> On Thu, Mar 30, 2023 at 8:37=E2=80=AFAM Mateusz Guzik wrote: >> > >> > I looked into it a little more, below you can find summary and steps >> > forward. >> > >> > First a general statement: while ULE does have performance bugs, it >> > has better basis than 4BSD to make scheduling decisions. Most notably >> > it understands CPU topology, at least for cases which don't involve >> > big.LITTLE. For any non-freak case where 4BSD performs better, it is a >> > bug in ULE if this is for any reason other than a tradeoff which can >> > be tweaked to line them up. Or more to the point, there should not be >> > any legitimate reason to use 4BSD these days and modulo the bugs >> > below, you are probably losing on performance for doing so. >> >> An elided simple algorithm for big.LITTLE, from Larry McVoy.. if you >> run for an entire quantum, flag preference for big core. If you run >> for less or get punted off, flag for little core preference. >> >> > Bugs reported in this thread by others and confirmed by me: >> > 1. failure to load-balance when having n CPUs and n + 1 workers -- the >> > excess one stays on one the same CPU thread continuously penalizing >> > the same victim. as a result total real time to execute a finite >> > computation is longer than in the case of 4BSD >> > 2. unfairness of nice -n 20 threads vs threads going frequently off >> > CPU (e.g., due to I/O) -- after using only a fraction of the slice the >> > victim has to wait for the cpu hog to use up its entire slice, rinse >> > and repeat. This extends a 7+ minute buildkernel to over 67 minutes, >> > not an issue on 4BSD >> > >> > I did not put almost any effort into investigating no 1. There is code >> > which is supposed to rebalance load across CPUs, someone(tm) will have >> > to sit through it -- for all I know the fix is trivial. >> > >> > Fixing number 2 makes *another* bug more acute and it complicates the >> > whole ordeal. >> > >> > Thus, bug reported by me: >> > 3. interactivity scoring is bogus -- originally introduced to detect >> > "interactive" behavior by equating being off CPU with waiting for user >> > input. One part of the problem is that it puts *all* non-preempted off >> > CPU time into one bag: a voluntary sleep. This includes suffering from >> > lock contention in the kernel, lock contention in the program itself, >> > file I/O and so on, none of which has bearing on how interactive or >> > not the program might happen to be. A bigger part of the problem is >> > that at least today, the graphical programs don't even act this way to >> > begin with -- they stay on CPU *a lot*. >> > >> > I asked people to provide me with the output of: dtrace -n >> > 'sched:::on-cpu { @[execname] =3D lquantize(curthread->td_priority, 0, >> > 224, 1); }' from their laptops/desktops. >> > >> > One finding is that most people (at least those who reported) use >> > firefox. >> > >> > Another finding is that the browser is above the threshold which would >> > be considered "interactive" for vast majority of the time in all >> > reported cases. >> > >> > I booted a 2 thread vm with xfce and decided to click around. Spawned >> > firefox, opened a file manager (Thunar) and from there I opened a >> > movie to play with mpv. As root I spawned make -j 2 buildkernel. it >> > was not particularly good :) >> > >> > I found that mpv spawns a bunch of threads, most notably 2 distinct >> > threads for audio and video output. The one for video got a priority >> > of 175, while the rest had either 88 or 89 -- the lowest for >> > timesharing not considered interactive [note lower is considered >> > better]. >> > >> > At the same time the file manager who was left in the background kept >> > doing evil syscall usage, which as a result bouncing between a regular >> > timesharing priority and one which made it "interactive", even though >> > the program was not touched for minutes. >> > >> > Or to put it differently, the scheduler failed to recognize that mpv >> > is the program to prioritize, all while thinking the background time >> > waster is the thing to look after (so to speak). >> > >> > This brings us to fixing problem 2: currently, due to the existence of >> > said problem, the interactivity scoring woes are less acute -- the >> > venerable make -j example is struggling to get CPU time, as a result >> > messing with real interactive programs to a lesser extent. If that >> > gets fixed, we are in a different boat altogether. >> > >> > I don't see a clean solution. > > One other random anecdote. Windows 11 uses window focus to highly > boost scheduling priority in an obviously effective way. I have no > idea how difficult something like that would be to fit into the unix > world. > I thought about doing something like that, but I consider it dodgy. Imagine you play some crap from youtube while messing around in a text editor -- I'm pretty sure the former is more prone to disturbance from scheduling changes. Anyhow after sending the above e-mail an actual solution hit me: the X server can tell the kernel what processes connect to it over the unix socket, which again very well may be good enough. In the reports I got I found pulseaudio, this one may need to be patched in a similar manner. --=20 Mateusz Guzik