Date: Sat, 25 Mar 2023 11:14:11 -0700 From: Mark Millard <marklmi@yahoo.com> To: Peter <pmc@citylink.dinoex.sub.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: Periodic rant about SCHED_ULE Message-ID: <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA@yahoo.com> References: <5AF26266-5B4C-4A7F-8784-4C6308B6C5CA.ref@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Peter <pmc_at_citylink.dinoex.sub.org> wrote on Date: Sat, 25 Mar 2023 15:47:42 UTC : > Quoting George Mitchell <george+freebsd@m5p.com>: >=20 > >> = https://forums.freebsd.org/threads/what-is-sysctl-kern-sched-preempt_thres= h.85 > >> > >Thank you! -- George >=20 > You're welcome. Can I get a success/failure report? >=20 >=20 > --------------------------------------------------------------------- > >> On 3/22/23, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote: > >>> > >>> I reported the issue with ULE some 15 to 20 years ago. >=20 > Can I get the PR number, please? >=20 >=20 > --------------------------------------------------------------------- > Test usecase: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > Create two compute tasks competing for the same -otherwise unused- = core,=20 > one without, one with syscalls:=20 >=20 > # cpuset -l 13 sh -c "while true; do :; done" &=20 > # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null=20 >=20 > Within a few seconds the two task are balanced, running at nearly the=20= > same PRI and using each 50% of the core:=20 >=20 > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND=20 > 5166 root 1 88 0 13M 3264K RUN 13 9:23 51.65% sh=20 > 10675 root 1 87 0 13M 3740K CPU13 13 1:30 48.57% gzip=20 >=20 > This changes when the tar reaches /usr/include with it's many small=20 > files. Now smaller blocks are delivered to gzip, it does more=20 > syscalls, and things get ugly:=20 >=20 > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND=20 > 5166 root 1 94 0 13M 3264K RUN 13 18:07 95.10% sh=20 > 19028 root 1 81 0 13M 3740K CPU13 13 1:23 4.87% gzip=20 Why did PID 10675 change to 19028? > This does not happen because tar would be slow in moving data to=20 > gzip: tar reads from SSD, or more likely from ARC, and this is=20 > always faster than gzip-9. The imbalance is made by the scheduler. When I tried that tar line, I get lots of output to stderr: # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null tar: Removing leading '/' from member names a . a root a wrkdirs a bin a usr . . . Was that an intentional part of the test? To avoid this I used: # tar cvf - / 2>/dev/null | cpuset -l 13 gzip -9 2>&1 > /dev/null At which point I get the likes of: 17129 root 1 68 0 14192Ki 3628Ki RUN 13 0:20 = 3.95% gzip -9 17128 root 1 20 0 58300Ki 13880Ki pipdwt 18 0:00 = 0.27% tar cvf - / (bsdtar) 17097 root 1 133 0 13364Ki 3060Ki CPU13 13 8:05 = 95.93% sh -c while true; do :; done up front. For reference, I also see the likes of the following from "gstat -spod" (it is a root on ZFS context with PCIe Optane media): dT: 1.063s w: 1.000s L(q) ops/s r/s kB kBps ms/r w/s kB kBps ms/w = d/s kB kBps ms/d o/s ms/o %busy Name . . . 0 68 68 14 937 0.0 0 0 0 0.0 = 0 0 0 0.0 0 0.0 0.1| nvd2 . . . =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5AF26266-5B4C-4A7F-8784-4C6308B6C5CA>