Date: Sat, 25 Mar 2023 15:35:36 -0700 From: Mark Millard <marklmi@yahoo.com> To: Peter <pmc@citylink.dinoex.sub.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Subject: Re: Periodic rant about SCHED_ULE Message-ID: <21FEBFC6-7A43-43C6-A888-EC9962D738B8@yahoo.com> In-Reply-To: <ZB9s9dul8j9YmJyw@disp.intra.daemon.contact> References: <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA.ref@yahoo.com> <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA@yahoo.com> <ZB9Ebx2/Eo/GQzDl@disp.intra.daemon.contact> <E67A639B-AE86-4632-A2F6-FF637E020A90@yahoo.com> <ZB9s9dul8j9YmJyw@disp.intra.daemon.contact>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Mar 25, 2023, at 14:51, Peter <pmc@citylink.dinoex.sub.org> wrote: >=20 > On Sat, Mar 25, 2023 at 01:41:16PM -0700, Mark Millard wrote: > ! On Mar 25, 2023, at 11:58, Peter <pmc@citylink.dinoex.sub.org> = wrote: >=20 > ! > !=20 > ! > ! At which point I get the likes of: > ! > !=20 > ! > ! 17129 root 1 68 0 14192Ki 3628Ki RUN 13 = 0:20 3.95% gzip -9 > ! > ! 17128 root 1 20 0 58300Ki 13880Ki pipdwt 18 = 0:00 0.27% tar cvf - / (bsdtar) > ! > ! 17097 root 1 133 0 13364Ki 3060Ki CPU13 13 = 8:05 95.93% sh -c while true; do :; done > ! > !=20 > ! > ! up front. > ! >=20 > ! > Ah. So? To me this doesn't look good. If both jobs are runnable, = they > ! > should each get ~50%. > ! >=20 > ! > ! For reference, I also see the likes of the following from > ! > ! "gstat -spod" (it is a root on ZFS context with PCIe Optane = media): > ! >=20 > ! > So we might assume that indeed both jobs are runable, and the only > ! > significant difference is that one does system calls while the = other > ! > doesn't. > ! >=20 > ! > The point of this all is: identify the malfunction with the most > ! > simple usecase. (And for me here is a malfunction.) > ! > And then, obviousely, fix it. > !=20 > ! I tried the following that still involves pipe-io but avoids > ! file system I/O (so: simplifying even more): > !=20 > ! cat /dev/random | cpuset -l 13 gzip -9 >/dev/null 2>&1 > !=20 > ! mixed with: > !=20 > ! cpuset -l 13 sh -c "while true; do :; done" & > !=20 > ! So far what I've observed is just the likes of: > !=20 > ! 17736 root 1 112 0 13364Ki 3048Ki RUN 13 2:03 = 53.15% sh -c while true; do :; done > ! 17735 root 1 111 0 14192Ki 3676Ki CPU13 13 2:20 = 46.84% gzip -9 > ! 17734 root 1 23 0 12704Ki 2364Ki pipewr 24 0:14 = 4.81% cat /dev/random > !=20 > ! Simplifying this much seems to get a different result. >=20 > Okay, then you have simplified too much and the malfunction is not > visible anymore. >=20 > ! Pipe I/O of itself does not appear to lead to the > ! behavior you are worried about. >=20 > How many bytes does /dev/random deliver in a single read() ? >=20 > ! Trying cat /dev/zero instead ends up similar: > !=20 > ! 17778 root 1 111 0 14192Ki 3672Ki CPU13 13 0:20 = 51.11% gzip -9 > ! 17777 root 1 24 0 12704Ki 2364Ki pipewr 30 0:02 = 5.77% cat /dev/zero > ! 17736 root 1 112 0 13364Ki 3048Ki RUN 13 6:36 = 48.89% sh -c while true; do :; done > !=20 > ! It seems that, compared to using tar and a file system, there > ! is some significant difference in context that leads to the > ! behavioral difference. It would probably be of interest to know > ! what the distinction(s) are in order to have a clue how to > ! interpret the results. >=20 > I can tell you: > With tar, tar can likely not output data from more than one input > file in a single output write(). So, when reading big files, we > get probably 16k or more per system call over the pipe. But if the > files are significantly smaller than that (e.g. in /usr/include), > then we get gzip doing more system calls per time unit. And that > makes a difference, because a system call goes into the scheduler > and reschedules the thread. >=20 > This 95% vs. 5% imbalance is the actual problem that has to be > addressed, because this is not suitable for me, I cannot wait for my > tasks starving along at a tenth of the expected compute only because > some number crunching does also run on the core. >=20 > Now, reading from /dev/random cannot reproduce it. Reading from > tar can reproduce it under certain conditions - and that is all that > is needed. The suggestion that the size of the transfers into the first pipe matters is back up by experiments with the likes of: dd if=3D/dev/zero bs=3D128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 & vs. dd if=3D/dev/zero bs=3D132 | cpuset -l 13 gzip -9 >/dev/null 2>&1 & vs. dd if=3D/dev/zero bs=3D133 | cpuset -l 13 gzip -9 >/dev/null 2>&1 & vs. dd if=3D/dev/zero bs=3D192 | cpuset -l 13 gzip -9 >/dev/null 2>&1 & vs. dd if=3D/dev/zero bs=3D1k | cpuset -l 13 gzip -9 >/dev/null 2>&1 & vs. dd if=3D/dev/zero bs=3D4k | cpuset -l 13 gzip -9 >/dev/null 2>&1 & vs. dd if=3D/dev/zero bs=3D16k | cpuset -l 13 gzip -9 >/dev/null 2>&1 & (just examples) as what is paired up with: cpuset -l 13 sh -c "while true; do :; done" & Such avoids the uncontrolled variability in use of tar against a file system. But an interesting comparison/contrast results from, for example: dd if=3D/dev/zero bs=3D128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 & vs. dd if=3D/dev/random bs=3D128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 & as what is paired with the: cpuset -l 13 sh -c "while true; do :; done" = & At least in my context, the /dev/zero one ends up with: 18251 root 1 68 0 14192Ki 3676Ki RUN 13 0:02 = 1.07% gzip -9 18250 root 1 20 0 12820Ki 2484Ki pipewr 29 0:02 = 1.00% dd if=3D/dev/zero bs=3D128 18177 root 1 135 0 13364Ki 3048Ki CPU13 13 14:47 = 98.93% sh -c while true; do :; done but the /dev/random one ends up with: 18253 root 1 108 0 14192Ki 3676Ki CPU13 13 0:09 = 50.74% gzip -9 18252 root 1 36 0 12820Ki 2488Ki pipewr 30 0:03 = 16.96% dd if=3D/dev/random bs=3D128 18177 root 1 115 0 13364Ki 3048Ki RUN 13 15:45 = 49.26% sh -c while true; do :; done It appears that the CPU time (or more) for the dd feeding the first pipe matters for the overall result, not just the bs=3D value used. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?21FEBFC6-7A43-43C6-A888-EC9962D738B8>