From nobody Sat Mar 25 21:51:49 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PkXt93M03z41mRQ for ; Sat, 25 Mar 2023 21:54:41 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PkXt90K8xz40Nm for ; Sat, 25 Mar 2023 21:54:40 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Authentication-Results: mx1.freebsd.org; none Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]) by uucp.dinoex.org (8.17.1/8.17.1) with ESMTPS id 32PLs6mH016236 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 25 Mar 2023 22:54:07 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from uucp@localhost) by uucp.dinoex.org (8.17.1/8.17.1/Submit) with UUCP id 32PLs699016235; Sat, 25 Mar 2023 22:54:06 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (disp-e.intra.daemon.contact [IPv6:fd00:0:0:0:0:0:0:112]) by admn.intra.daemon.contact (8.17.1/8.17.1) with ESMTPS id 32PLqcsv075208 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK); Sat, 25 Mar 2023 22:52:40 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (localhost [127.0.0.1]) by disp.intra.daemon.contact (8.17.1/8.17.1) with ESMTPS id 32PLpn8O013551 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 25 Mar 2023 22:51:50 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from pmc@localhost) by disp.intra.daemon.contact (8.17.1/8.17.1/Submit) id 32PLpnEZ013550; Sat, 25 Mar 2023 22:51:49 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: disp.intra.daemon.contact: pmc set sender to pmc@citylink.dinoex.sub.org using -f Date: Sat, 25 Mar 2023 22:51:49 +0100 From: Peter To: Mark Millard Cc: FreeBSD Hackers Subject: Re: Periodic rant about SCHED_ULE Message-ID: References: <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA.ref@yahoo.com> <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA@yahoo.com> List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Milter: Spamilter (Reciever: uucp.dinoex.org; Sender-ip: 0:0:2a0b:f840::; Sender-helo: uucp.dinoex.org;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]); Sat, 25 Mar 2023 22:54:09 +0100 (CET) X-Rspamd-Queue-Id: 4PkXt90K8xz40Nm X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:205376, ipnet:2a0b:f840::/32, country:DE] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Sat, Mar 25, 2023 at 01:41:16PM -0700, Mark Millard wrote: ! On Mar 25, 2023, at 11:58, Peter wrote: ! > ! ! > ! At which point I get the likes of: ! > ! ! > ! 17129 root 1 68 0 14192Ki 3628Ki RUN 13 0:20 3.95% gzip -9 ! > ! 17128 root 1 20 0 58300Ki 13880Ki pipdwt 18 0:00 0.27% tar cvf - / (bsdtar) ! > ! 17097 root 1 133 0 13364Ki 3060Ki CPU13 13 8:05 95.93% sh -c while true; do :; done ! > ! ! > ! up front. ! > ! > Ah. So? To me this doesn't look good. If both jobs are runnable, they ! > should each get ~50%. ! > ! > ! For reference, I also see the likes of the following from ! > ! "gstat -spod" (it is a root on ZFS context with PCIe Optane media): ! > ! > So we might assume that indeed both jobs are runable, and the only ! > significant difference is that one does system calls while the other ! > doesn't. ! > ! > The point of this all is: identify the malfunction with the most ! > simple usecase. (And for me here is a malfunction.) ! > And then, obviousely, fix it. ! ! I tried the following that still involves pipe-io but avoids ! file system I/O (so: simplifying even more): ! ! cat /dev/random | cpuset -l 13 gzip -9 >/dev/null 2>&1 ! ! mixed with: ! ! cpuset -l 13 sh -c "while true; do :; done" & ! ! So far what I've observed is just the likes of: ! ! 17736 root 1 112 0 13364Ki 3048Ki RUN 13 2:03 53.15% sh -c while true; do :; done ! 17735 root 1 111 0 14192Ki 3676Ki CPU13 13 2:20 46.84% gzip -9 ! 17734 root 1 23 0 12704Ki 2364Ki pipewr 24 0:14 4.81% cat /dev/random ! ! Simplifying this much seems to get a different result. Okay, then you have simplified too much and the malfunction is not visible anymore. ! Pipe I/O of itself does not appear to lead to the ! behavior you are worried about. How many bytes does /dev/random deliver in a single read() ? ! Trying cat /dev/zero instead ends up similar: ! ! 17778 root 1 111 0 14192Ki 3672Ki CPU13 13 0:20 51.11% gzip -9 ! 17777 root 1 24 0 12704Ki 2364Ki pipewr 30 0:02 5.77% cat /dev/zero ! 17736 root 1 112 0 13364Ki 3048Ki RUN 13 6:36 48.89% sh -c while true; do :; done ! ! It seems that, compared to using tar and a file system, there ! is some significant difference in context that leads to the ! behavioral difference. It would probably be of interest to know ! what the distinction(s) are in order to have a clue how to ! interpret the results. I can tell you: With tar, tar can likely not output data from more than one input file in a single output write(). So, when reading big files, we get probably 16k or more per system call over the pipe. But if the files are significantly smaller than that (e.g. in /usr/include), then we get gzip doing more system calls per time unit. And that makes a difference, because a system call goes into the scheduler and reschedules the thread. This 95% vs. 5% imbalance is the actual problem that has to be addressed, because this is not suitable for me, I cannot wait for my tasks starving along at a tenth of the expected compute only because some number crunching does also run on the core. Now, reading from /dev/random cannot reproduce it. Reading from tar can reproduce it under certain conditions - and that is all that is needed.