Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Mar 2023 15:35:36 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Peter <pmc@citylink.dinoex.sub.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Cc:        Mateusz Guzik <mjguzik@gmail.com>
Subject:   Re: Periodic rant about SCHED_ULE
Message-ID:  <21FEBFC6-7A43-43C6-A888-EC9962D738B8@yahoo.com>
In-Reply-To: <ZB9s9dul8j9YmJyw@disp.intra.daemon.contact>
References:  <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA.ref@yahoo.com> <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA@yahoo.com> <ZB9Ebx2/Eo/GQzDl@disp.intra.daemon.contact> <E67A639B-AE86-4632-A2F6-FF637E020A90@yahoo.com> <ZB9s9dul8j9YmJyw@disp.intra.daemon.contact>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Mar 25, 2023, at 14:51, Peter <pmc@citylink.dinoex.sub.org> wrote:
>=20
> On Sat, Mar 25, 2023 at 01:41:16PM -0700, Mark Millard wrote:
> ! On Mar 25, 2023, at 11:58, Peter <pmc@citylink.dinoex.sub.org> =
wrote:
>=20
> ! > !=20
> ! > ! At which point I get the likes of:
> ! > !=20
> ! > ! 17129 root          1  68    0  14192Ki    3628Ki RUN     13   =
0:20   3.95% gzip -9
> ! > ! 17128 root          1  20    0  58300Ki   13880Ki pipdwt  18   =
0:00   0.27% tar cvf - / (bsdtar)
> ! > ! 17097 root          1 133    0  13364Ki    3060Ki CPU13   13   =
8:05  95.93% sh -c while true; do :; done
> ! > !=20
> ! > ! up front.
> ! >=20
> ! > Ah. So? To me this doesn't look good. If both jobs are runnable, =
they
> ! > should each get ~50%.
> ! >=20
> ! > ! For reference, I also see the likes of the following from
> ! > ! "gstat -spod" (it is a root on ZFS context with PCIe Optane =
media):
> ! >=20
> ! > So we might assume that indeed both jobs are runable, and the only
> ! > significant difference is that one does system calls while the =
other
> ! > doesn't.
> ! >=20
> ! > The point of this all is: identify the malfunction with the most
> ! > simple usecase. (And for me here is a malfunction.)
> ! > And then, obviousely, fix it.
> !=20
> ! I tried the following that still involves pipe-io but avoids
> ! file system I/O (so: simplifying even more):
> !=20
> ! cat /dev/random | cpuset -l 13 gzip -9 >/dev/null 2>&1
> !=20
> ! mixed with:
> !=20
> ! cpuset -l 13 sh -c "while true; do :; done" &
> !=20
> ! So far what I've observed is just the likes of:
> !=20
> ! 17736 root          1 112    0  13364Ki    3048Ki RUN     13   2:03  =
53.15% sh -c while true; do :; done
> ! 17735 root          1 111    0  14192Ki    3676Ki CPU13   13   2:20  =
46.84% gzip -9
> ! 17734 root          1  23    0  12704Ki    2364Ki pipewr  24   0:14  =
 4.81% cat /dev/random
> !=20
> ! Simplifying this much seems to get a different result.
>=20
> Okay, then you have simplified too much and the malfunction is not
> visible anymore.
>=20
> ! Pipe I/O of itself does not appear to lead to the
> ! behavior you are worried about.
>=20
> How many bytes does /dev/random deliver in a single read() ?
>=20
> ! Trying cat /dev/zero instead ends up similar:
> !=20
> ! 17778 root          1 111    0  14192Ki    3672Ki CPU13   13   0:20  =
51.11% gzip -9
> ! 17777 root          1  24    0  12704Ki    2364Ki pipewr  30   0:02  =
 5.77% cat /dev/zero
> ! 17736 root          1 112    0  13364Ki    3048Ki RUN     13   6:36  =
48.89% sh -c while true; do :; done
> !=20
> ! It seems that, compared to using tar and a file system, there
> ! is some significant difference in context that leads to the
> ! behavioral difference. It would probably be of interest to know
> ! what the distinction(s) are in order to have a clue how to
> ! interpret the results.
>=20
> I can tell you:
> With tar, tar can likely not output data from more than one input
> file in a single output write(). So, when reading big files, we
> get probably 16k or more per system call over the pipe. But if the
> files are significantly smaller than that (e.g. in /usr/include),
> then we get gzip doing more system calls per time unit. And that
> makes a difference, because a system call goes into the scheduler
> and reschedules the thread.
>=20
> This 95% vs. 5% imbalance is the actual problem that has to be
> addressed, because this is not suitable for me, I cannot wait for my
> tasks starving along at a tenth of the expected compute only because
> some number crunching does also run on the core.
>=20
> Now, reading from /dev/random cannot reproduce it. Reading from
> tar can reproduce it under certain conditions - and that is all that
> is needed.

The suggestion that the size of the transfers into the
first pipe matters is back up by experiments with the
likes of:

dd if=3D/dev/zero bs=3D128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
vs.
dd if=3D/dev/zero bs=3D132 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
vs.
dd if=3D/dev/zero bs=3D133 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
vs.
dd if=3D/dev/zero bs=3D192 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
vs.
dd if=3D/dev/zero bs=3D1k | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
vs.
dd if=3D/dev/zero bs=3D4k | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
vs.
dd if=3D/dev/zero bs=3D16k | cpuset -l 13 gzip -9 >/dev/null 2>&1 &

(just examples) as what is paired up with:

cpuset -l 13 sh -c "while true; do :; done" &

Such avoids the uncontrolled variability in use of tar
against a file system.

But an interesting comparison/contrast results from, for
example:

dd if=3D/dev/zero bs=3D128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
vs.
dd if=3D/dev/random bs=3D128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &

as what is paired with the: cpuset -l 13 sh -c "while true; do :; done" =
&

At least in my context, the /dev/zero one ends up with:

18251 root          1  68    0  14192Ki    3676Ki RUN     13   0:02   =
1.07% gzip -9
18250 root          1  20    0  12820Ki    2484Ki pipewr  29   0:02   =
1.00% dd if=3D/dev/zero bs=3D128
18177 root          1 135    0  13364Ki    3048Ki CPU13   13  14:47  =
98.93% sh -c while true; do :; done

but the /dev/random one ends up with:

18253 root          1 108    0  14192Ki    3676Ki CPU13   13   0:09  =
50.74% gzip -9
18252 root          1  36    0  12820Ki    2488Ki pipewr  30   0:03  =
16.96% dd if=3D/dev/random bs=3D128
18177 root          1 115    0  13364Ki    3048Ki RUN     13  15:45  =
49.26% sh -c while true; do :; done

It appears that the CPU time (or more) for the dd feeding the
first pipe matters for the overall result, not just the
bs=3D value used.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?21FEBFC6-7A43-43C6-A888-EC9962D738B8>