Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Mar 2023 11:23:04 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Peter <pmc@citylink.dinoex.sub.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Periodic rant about SCHED_ULE
Message-ID:  <76DAACBB-C865-4779-A340-D66C35D610B4@yahoo.com>
In-Reply-To: <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA@yahoo.com>
References:  <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mar 25, 2023, at 11:14, Mark Millard <marklmi@yahoo.com> wrote:

> Peter <pmc_at_citylink.dinoex.sub.org> wrote on
> Date: Sat, 25 Mar 2023 15:47:42 UTC :
>=20
>> Quoting George Mitchell <george+freebsd@m5p.com>:
>>=20
>>>> =
https://forums.freebsd.org/threads/what-is-sysctl-kern-sched-preempt_thres=
h.85
>>>>=20
>>> Thank you! -- George
>>=20
>> You're welcome. Can I get a success/failure report?
>>=20
>>=20
>> ---------------------------------------------------------------------
>>>> On 3/22/23, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
>>>>>=20
>>>>> I reported the issue with ULE some 15 to 20 years ago.
>>=20
>> Can I get the PR number, please?
>>=20
>>=20
>> ---------------------------------------------------------------------
>> Test usecase:
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>=20
>> Create two compute tasks competing for the same -otherwise unused- =
core,=20
>> one without, one with syscalls:=20
>>=20
>> # cpuset -l 13 sh -c "while true; do :; done" &=20
>> # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null=20
>>=20
>> Within a few seconds the two task are balanced, running at nearly the=20=

>> same PRI and using each 50% of the core:=20
>>=20
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND=20
>> 5166 root 1 88 0 13M 3264K RUN 13 9:23 51.65% sh=20
>> 10675 root 1 87 0 13M 3740K CPU13 13 1:30 48.57% gzip=20
>>=20
>> This changes when the tar reaches /usr/include with it's many small=20=

>> files. Now smaller blocks are delivered to gzip, it does more=20
>> syscalls, and things get ugly:=20
>>=20
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND=20
>> 5166 root 1 94 0 13M 3264K RUN 13 18:07 95.10% sh=20
>> 19028 root 1 81 0 13M 3740K CPU13 13 1:23 4.87% gzip=20
>=20
> Why did PID 10675 change to 19028?
>=20
>> This does not happen because tar would be slow in moving data to=20
>> gzip: tar reads from SSD, or more likely from ARC, and this is=20
>> always faster than gzip-9. The imbalance is made by the scheduler.
>=20
>=20
> When I tried that tar line, I get lots of output to stderr:
>=20
> # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null
> tar: Removing leading '/' from member names
> a .
> a root
> a wrkdirs
> a bin
> a usr
> . . .
>=20
> Was that an intentional part of the test?
>=20
> To avoid this I used:
>=20
> # tar cvf - / 2>/dev/null | cpuset -l 13 gzip -9 2>&1 > /dev/null
>=20
> At which point I get the likes of:
>=20
> 17129 root          1  68    0  14192Ki    3628Ki RUN     13   0:20   =
3.95% gzip -9
> 17128 root          1  20    0  58300Ki   13880Ki pipdwt  18   0:00   =
0.27% tar cvf - / (bsdtar)
> 17097 root          1 133    0  13364Ki    3060Ki CPU13   13   8:05  =
95.93% sh -c while true; do :; done
>=20
> up front.
>=20
> For reference, I also see the likes of the following from
> "gstat -spod" (it is a root on ZFS context with PCIe Optane media):
>=20
> dT: 1.063s  w: 1.000s
> L(q)  ops/s    r/s     kB   kBps   ms/r    w/s     kB   kBps   ms/w    =
d/s     kB   kBps   ms/d    o/s   ms/o   %busy Name
> . . .
>    0     68     68     14    937    0.0      0      0      0    0.0    =
  0      0      0    0.0      0    0.0    0.1| nvd2
> . . .
>=20
>=20

I left it running and I'm now seeing:

17129 root          1 107    0  14192Ki    3628Ki CPU13   13   3:01  =
48.10% gzip -9
17128 root          1  21    0  58300Ki   15428Ki pipdwt  20   0:04   =
2.02% tar cvf - / (bsdtar)
17097 root          1 115    0  13364Ki    3060Ki RUN     13  16:30  =
51.77% sh -c while true; do :; done

Also examples of the likes of:

dT: 1.063s  w: 1.000s
L(q)  ops/s    r/s     kB   kBps   ms/r    w/s     kB   kBps   ms/w    =
d/s     kB   kBps   ms/d    o/s   ms/o   %busy Name
. . .
    0   1213   1213      5   6456    0.0      0      0      0    0.0     =
 0      0      0    0.0      0    0.0    1.2| nvd2
. . .

FYI: ThreadRipper 1950X context.

Looks like what I'll see is very dependent on when I
look at what it is doing: the details involved matter.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?76DAACBB-C865-4779-A340-D66C35D610B4>