Date: Fri, 18 Jun 2021 08:12:53 -0700 From: John Baldwin <jhb@FreeBSD.org> To: Poul-Henning Kamp <phk@phk.freebsd.dk>, arch@freebsd.org Subject: Re: It's time to kill statistical profiling Message-ID: <d63d21fe-c1ef-dbb9-64e2-3e23621820bc@FreeBSD.org> In-Reply-To: <202106180736.15I7aYmk068064@critter.freebsd.dk> References: <202106180736.15I7aYmk068064@critter.freebsd.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
On 6/18/21 12:36 AM, Poul-Henning Kamp wrote: > Warners work to document the kernel timers in D30802 brought stathz up again. > > To give a representative result, statistical profiling needs to > sample no less than approx 0.1% of instructions. > > On a VAX that meant running the statistical profiling at O(1kHz). > > On my 4 CPU, two thread, 2GHz laptop that means statistical profiling > needs to run at O(10 MHz), which is barely doable. > > But it is worse: > > The samples must be unbiased with respect to the system activity, > which was already a problem on the VAX and which is totally impossible > on modern hardware, with message based interrupts, deep pipelines > and telegraphic distance memory[1]. > > Therefore statistical profiling is worse than useless: it is downright > misleading, which is why modern CPUs have hardware performance counters. > > Instead of documenting stathz, I suggest we retire statistical > profiling and convert the profiled libraries to code-coverage > profiling (-fprofile-arcs and -ftest-coverage) > > Poul-Henning > > [1] One could *possibly* approch unbiased samples, by locking the > stathz code path in L1 cache and disable L1 updates, but then > the results would be from an entirely different system. Note that only profhz is what you could kill. stathz is used for statclock to compute rusage and the %CPU for ps(1) as well as the cp_time stats for system-wide (and per-CPU) time stats. What I would like to do for rusage is to have an option to split up rux_runtime into separate "raw" iruntime, sruntime, and uruntime and switch between them on kernel entry/exit similar to what we do now in mi_switch(). This would remove the need for iticks/uticks/sticks and the need for calcru() to try to do subdividing and then playing games to prevent individual times going backwards. Instead, it would just do a straightforward conversion of the component <x>runtime to the value getrusage() wants. I've just never gotten around to doing that. However, even with that, you are still stuck with providing whatever events the schedule wants to set %CPU for ps(1). You also still need something to provide the kern.cp_time arrays used for CPU usage. statclock might still be the simplest way to provide those. I agree that hwpmc is what one should use for real profiling, but there's actually not much that you get to axe in the kernel when removing the kernel-side support for the old profiling. As Konstantin has noted, we already no longer build or ship -pg libraries by default. I'd be fine with removing the build glue for that outright, or with generalizing it as Konstantin suggests, though I would probably not even want to keep -pg as one of the variants for the generalization. To that end, I would be fine with just removing all the -pg support and if someone wants to add a a new variant they can deal with making it more general at that time. I'd much rather someone spend time on adding support for PGO and LTO to our build infrastructure than trying to keep -pg alive. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d63d21fe-c1ef-dbb9-64e2-3e23621820bc>