Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Dec 2025 11:04:34 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        Steve Kargl <kargls@comcast.net>
Cc:        freebsd-hackers <freebsd-hackers@freebsd.org>
Subject:   Re: profiling a user executable?
Message-ID:  <47F509B6-F536-4E70-9226-16630128EC38@yahoo.com>
In-Reply-To: <B8A4B3EA-FB41-474C-B4BD-8722FC7C4AED@yahoo.com>
References:  <12053856-4DE5-4B98-9309-028869BB5395.ref@yahoo.com> <12053856-4DE5-4B98-9309-028869BB5395@yahoo.com> <fe1f5702-fcd0-4fea-bf34-be070713abae@comcast.net> <B8A4B3EA-FB41-474C-B4BD-8722FC7C4AED@yahoo.com>

index | next in thread | previous in thread | raw e-mail

On Dec 15, 2025, at 09:05, Mark Millard <marklmi@yahoo.com> wrote:
> 
> On Dec 14, 2025, at 23:18, Steve Kargl <kargls@comcast.net> wrote:
> 
>> On 12/14/25 21:55, Mark Millard wrote:
>>> Steve Kargl <kargls_at_comcast.net> wrote on
>>> Date: Sat, 13 Dec 2025 20:12:12 UTC :
>>>> On 12/12/25 14:14, Ahmad Khalifa wrote:
>>>>> On Thu Dec 11, 2025 at 8:13 PM +0200, Steve Kargl wrote:
>>>>>> In the days of yore, one could add the '-pg' option to
>>>>>> the compilers options to generate profiling information,
>>>>>> which could be consumed by gprof(1).
>>>>>> 
>>>>>> FreeBSD stopped shipping libc_p.a, libm_p.m, etc
>>>>>> (disabled in fe52b7f60ef4 and deleted in 3750ccefb8).
>>>>>> This breaks all lang/gcc* ports if one uses '-pg'. It is
>>>>>> not too difficult to fix lang/gcc* to avoid the missing
>>>>>> *_p.a files, but this seems to lead to invalid *.gmon files.
>>>>>> At least, for a Fortran application that I would like to
>>>>>> profile (compiled with gfortran), procedures in my libfoo_p.a,
>>>>>> appear in the profile, which I know with 100% certainty are
>>>>>> not referenced.
>>>>>> 
>>>>>> So, how does one in modern FreeBSD, as mere normal user,
>>>>>> profile an executable? A google search suggests pmcstat(8)
>>>>>> may be of use, but all attempts to use it lead to a usage
>>>>>> message printed to the terminal. I'm simply trying to
>>>>>> determine where my code is spending all of its time.
>>>>> 
>>>>> Just throwing in another option, you can use dtrace's profile-n probes.
>>>>> 
>>>> 
>>>> dtrace appears to be a useless for a mere user.
>>>> 
>>>> % dtrace -n 'profile-99 /execname == "../../build/bin/tier -q"/ \
>>> As I remember, execname holds only the base name that had been given
>>> to exec for the current thread/process. Also, it is not a way to run
>>> a program. It is a way to select processes/threads that are running
>>> a known-base-name of interest. It is  DTrace variable in specific
>>> probes, not all probes.
>>> As I remember, dtrace uses -c COMMAND notation to run the command
>>> and exit once that command completes.
>>> Trying to deal with paths is much more involved and can involve things
>>> like copyinstr(arg0) notation, arg0 being for the first argument to the
>>> probe as the example.
>> 
>> Unfortunately, dtrace requires root privilege, and so is
>> a non-starter.
>> 
>> I adapted your suggestions with pmcstat to my problem,
>> and it seems promising.
>> 
>> % pmcstat -O pmc.0 -P ex_ret_instr ../../build/bin/tier -q
> 
> Turns out that it looks like "ex_ret_instr" is about the number of
> retired instructions related to function returns (popping a return
> address off the stack).

I may have gone down the wrong path for ex_ret_instr .

It looks like it is retired instruction for AMD 17h
and AMD 19h families, which would match what I was using.
It is not so clear for other contexts.

It looks like some intel x64 processors might have a
inst_retired.any instead.

Looks like amd has FreeBSD support for AMDuPRof for some
ZEN processors:

https://www.amd.com/en/developer/uprof.html

"Call Stack Sampling – Native (C, C++, and FORTRAN)" is listed
as YES for FreeBSD. Only Command Line lists YES For FreeBSD.
So AMDuProfCLI tool, apparently.

But . . .

"Note: Run AMD uProf on FreeBSD with sudo command or root privilege."


> And, ls_not_halted_cyc is tracking the number of cycles where the
> Load/Store unit is not stalled waiting for memory.
> 
> Not as analogous to aarch64 as I thought.
> 
>> % pmcstat -R pmc.0 -g
>> % gprof ../../build/bin/tier ex_ret_instr/tier.gmon | head -10 | tail -8
>> Each sample counts as 0.0078125 seconds.
>> %   cumulative   self           self     total
>> time  seconds   seconds  calls  ms/call ms/call  name
>> 36.68   13.15   13.15   13192  1.00     1.75  __spherem_MOD_sphere
>> 23.06   21.41   8.27                          __pnam_MOD_pna_dble
>> 16.85   27.45   6.04    2045   2.95     3.00  __sjnm_MOD_sjn_dble
>> 10.08   31.07   3.61     693   5.21     5.34  __synm_MOD_syn_dble
>> 8.02   33.94   2.88                          __sjnm_MOD_sjn_sngl
> 
> Looks like the empty "calls", self "ms/call", and
> total "ms/call" columns might be indicating the
> lack of calls, despite the left hand side time
> information.
> 
> May be __sjnm_MOD_sjn_sngl or the like is the
> closest prior symbol available for a "static"
> (non-public) routine that is not published for
> linking?
> 
>> I know with 100% certainty that __sjnm_MOD_sjn_sngl is not
>> referenced in the code as I wrote it.  I'll note the above
>> is similar to what 'gfortran -pg' produces.
>> 
>> % pmcstat -R pmc.0 -G zxc.graph
>> CONVERSION STATISTICS:
>> #exec/elf                                1
>> #samples/total                           67133
>> #samples/unknown-function                1775
>> #callchain/dubious-frames                17
>> % grep sjn_dble zxc.graph | wc -l
>>    258
>> % grep sjn_sngl zxc.graph | wc -l
>>      0
>> 
>> The callgraph shows that __sjnm_MOD_sjn_sngl is not used.
>> My working conclusion is that gprof is simply broken.  I'm
>> still investigating what pmcstat can given me.  Given the
>> attempt to convert to a gprof file, hopefully I can get
>> something like
>> 
>> % pmcstat -R pmc.0 [some option(s)]
>> cycles  cycles/cal  function
>> 10000       90     __spherem_MOD_sphere
>> 12345       191     __pnam_MOD_pna_dble
>> 5433       400     __sjnm_MOD_sjn_dble
>> 15000      1500     __synm_MOD_syn_dble
>> 
>> This would tell me which routine(s) to look into for
>> optimizations.
>> 
> 
> It probably gets back to if there is an event type
> that is appropriate.
> 
> ls_not_halted_cyc would not treat waiting-for-memory
> time uniformly with load/store unit active time.
> But it would give information related to if waiting
> for memory was an issue or not.


===
Mark Millard
marklmi at yahoo.com



help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47F509B6-F536-4E70-9226-16630128EC38>