Date: Mon, 30 Nov 2020 06:32:34 -0600 From: Scott Bennett <bennett@sdf.org> To: jbeich@FreeBSD.org, ihor@antonovs.family Cc: ml@netfence.it, freebsd-questions@freebsd.org Subject: Re: Profiling C application Message-ID: <202011301232.0AUCWZ58015525@sdf.org> In-Reply-To: <zh2z-4i82-wny@FreeBSD.org> References: <39693639-6531-493f-fb24-53519f66cd98@antonovs.family> <zh2z-4i82-wny@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Jan Beich <jbeich@FreeBSD.org> wrote: > Ihor Antonov <ihor@antonovs.family> writes: > > > Hello folks, > > > > What can I use on FreeBSD to profile a slow application? > > > > I have a source code of the application and I can rebuild it if necessary. > > > > I want to see where (in what functions) application spends most of its > > time - amount of function calls and duration of each call (per call) > > > > What tools can you recommend to do this? > > hwpmc (pmcstat, pmcannotate) and dtrace, see examples at the end. > There're probably other ways, so better ask on hackers@ list. > > # Load kernel module and check available counters. Aliases like > # "instructions" or "unhalted-cycles" may not be available. > $ kldload hwpmc > $ pmccontrol -L | grep \\<inst > inst_retired.any > inst_retired.any_p > inst_retired.prec_dist > inst_retired.total_cycles_ps > > # Interactive use > $ pmcstat -t firefox -P inst_retired.any -T > PMC: [inst_retired.any] Samples: 287408 (100.0%) , 96792 unresolved > Key: => pause => press space again to continue > %SAMP IMAGE FUNCTION CALLERS > 3.0 libxul.so _ZN2sh13ResourcesHLSL14unifor _ZN2sh10OutputHLSL9visitLoopE:1.9 ... > 1.4 libxul.so _ZN7mozilla19TrackBuffersMana _ZN7mozilla10OggDemuxer13Seek > 1.4 libavcodec encode_thread mp_set_zero_yuv > 1.4 libxul.so _ZN2sh10OutputHLSL9visitLoopE _ZN2sh10OutputHLSL18visitCons > 1.0 libxul.so _ZN7mozilla3dom12FetchRequest _ZN7mozilla26RemoteLazyInputS > 1.0 libavcodec epzs_motion_search2 epzs_motion_search2 > 0.9 kernel cpu_search_highest cpu_search_highest > 0.7 libxul.so _ZN2sh10OutputHLSL22visitFunc _ZN2sh13TParseContext16checkC > 0.7 libxul.so _ZN7mozilla3dom16HTMLMediaEle ... > 0.6 libc.so.7 regexec > [...] > > # Capture on disk > $ pmcstat -t firefox -P inst_retired.any -O firefox.pmc > > # Flamegraph > $ git clone https://github.com/brendangregg/FlameGraph > $ pmcstat -z16 -R firefox.pmc -G firefox.graph > $ FlameGraph/stackcollapse-pmc.pl firefox.graph > firefox.stack > $ FlameGraph/flamegraph.pl firefox.stack > firefox.svg > $ firefox firefox.svg > > # gprof-like (no need to build with -pg, regular non-debug symbols will do) > $ pmcstat -R firefox.pmc -g > $ gprof $(which firefox) inst_retired.any/firefox.gmon > Flat profile: > > Each sample counts as 0.00787402 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 28.42 14.67 14.67 (anonymous namespace)::LoadELF(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, (anonymous namespace)::MmapWrapper*, void**) > 17.21 23.55 8.88 mozilla::PrintfTarget::vprint(char const*, __va_list_tag*) > 11.03 29.24 5.69 643 8.85 8.85 mozilla::JSONWriter::Scalar(mozilla::Span<char const, 18446744073709551615ul> const&, mozilla::Span<char const, 18446744073709551615ul> const&) > 9.55 34.17 4.93 lul::ReadSymbolDataInternal(unsigned char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, lul::SecMap*, void*, unsigned long, lul::UniqueStringUniverse*, void (*)(char const*)) > [...] > > $ gprof /usr/local/lib/libavcodec.so inst_retired.any/libavcodec.so.*.gmon > Flat profile: > > Each sample counts as 0.00787402 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 29.63 2516.15 2516.15 341629 7.37 7.69 encode_thread > 13.33 3647.98 1131.83 498281 2.27 3.57 epzs_motion_search2 > 8.02 4329.25 681.28 decode_chunks > 6.87 4912.72 583.46 mpc8_decode_frame > [...] > > $ gprof /boot/modules/i915kms.ko inst_retired.any/i915kms.ko.gmon > Flat profile: > > Each sample counts as 0.00787402 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 13.56 1.73 1.73 eb_lookup_vmas > 11.78 3.24 1.50 57 26.38 26.38 i915_active_ref > 9.68 4.47 1.24 26 47.55 47.85 i915_request_await_object > 6.29 5.28 0.80 i915_gem_do_execbuffer > 4.62 5.87 0.59 __i915_vma_active > [...] Nice examples! Thank you for posting them. I will definitely save these for future reference. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202011301232.0AUCWZ58015525>