From owner-freebsd-questions@freebsd.org Mon Nov 30 12:32:09 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 19A7247B181 for ; Mon, 30 Nov 2020 12:32:09 +0000 (UTC) (envelope-from bennett@sdf.org) Received: from mx.sdf.org (mx.sdf.org [205.166.94.24]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mx.sdf.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Cl4P411wNz3Jl2; Mon, 30 Nov 2020 12:32:07 +0000 (UTC) (envelope-from bennett@sdf.org) Received: from sdf.org (IDENT:bennett@faeroes.freeshell.org [205.166.94.9]) by mx.sdf.org (8.15.2/8.14.5) with ESMTPS id 0AUCW2vB028927 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits) verified NO); Mon, 30 Nov 2020 12:32:02 GMT Received: (from bennett@localhost) by sdf.org (8.15.2/8.12.8/Submit) id 0AUCWZ58015525; Mon, 30 Nov 2020 06:32:35 -0600 (CST) From: Scott Bennett Message-Id: <202011301232.0AUCWZ58015525@sdf.org> Date: Mon, 30 Nov 2020 06:32:34 -0600 To: jbeich@FreeBSD.org, ihor@antonovs.family Subject: Re: Profiling C application Cc: ml@netfence.it, freebsd-questions@freebsd.org References: <39693639-6531-493f-fb24-53519f66cd98@antonovs.family> In-Reply-To: User-Agent: Heirloom mailx 12.5 6/20/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4Cl4P411wNz3Jl2 X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.80 / 15.00]; ARC_NA(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[205.166.94.24:from]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:205.166.94.0/24:c]; MID_RHS_MATCH_FROM(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; SPAMHAUS_ZRD(0.00)[205.166.94.24:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[sdf.org,quarantine]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:14361, ipnet:205.166.94.0/24, country:US]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-questions] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Nov 2020 12:32:09 -0000 Jan Beich wrote: > Ihor Antonov writes: > > > Hello folks, > > > > What can I use on FreeBSD to profile a slow application? > > > > I have a source code of the application and I can rebuild it if necessary. > > > > I want to see where (in what functions) application spends most of its > > time - amount of function calls and duration of each call (per call) > > > > What tools can you recommend to do this? > > hwpmc (pmcstat, pmcannotate) and dtrace, see examples at the end. > There're probably other ways, so better ask on hackers@ list. > > # Load kernel module and check available counters. Aliases like > # "instructions" or "unhalted-cycles" may not be available. > $ kldload hwpmc > $ pmccontrol -L | grep \\ inst_retired.any > inst_retired.any_p > inst_retired.prec_dist > inst_retired.total_cycles_ps > > # Interactive use > $ pmcstat -t firefox -P inst_retired.any -T > PMC: [inst_retired.any] Samples: 287408 (100.0%) , 96792 unresolved > Key: => pause => press space again to continue > %SAMP IMAGE FUNCTION CALLERS > 3.0 libxul.so _ZN2sh13ResourcesHLSL14unifor _ZN2sh10OutputHLSL9visitLoopE:1.9 ... > 1.4 libxul.so _ZN7mozilla19TrackBuffersMana _ZN7mozilla10OggDemuxer13Seek > 1.4 libavcodec encode_thread mp_set_zero_yuv > 1.4 libxul.so _ZN2sh10OutputHLSL9visitLoopE _ZN2sh10OutputHLSL18visitCons > 1.0 libxul.so _ZN7mozilla3dom12FetchRequest _ZN7mozilla26RemoteLazyInputS > 1.0 libavcodec epzs_motion_search2 epzs_motion_search2 > 0.9 kernel cpu_search_highest cpu_search_highest > 0.7 libxul.so _ZN2sh10OutputHLSL22visitFunc _ZN2sh13TParseContext16checkC > 0.7 libxul.so _ZN7mozilla3dom16HTMLMediaEle ... > 0.6 libc.so.7 regexec > [...] > > # Capture on disk > $ pmcstat -t firefox -P inst_retired.any -O firefox.pmc > > # Flamegraph > $ git clone https://github.com/brendangregg/FlameGraph > $ pmcstat -z16 -R firefox.pmc -G firefox.graph > $ FlameGraph/stackcollapse-pmc.pl firefox.graph > firefox.stack > $ FlameGraph/flamegraph.pl firefox.stack > firefox.svg > $ firefox firefox.svg > > # gprof-like (no need to build with -pg, regular non-debug symbols will do) > $ pmcstat -R firefox.pmc -g > $ gprof $(which firefox) inst_retired.any/firefox.gmon > Flat profile: > > Each sample counts as 0.00787402 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 28.42 14.67 14.67 (anonymous namespace)::LoadELF(std::__1::basic_string, std::__1::allocator > const&, (anonymous namespace)::MmapWrapper*, void**) > 17.21 23.55 8.88 mozilla::PrintfTarget::vprint(char const*, __va_list_tag*) > 11.03 29.24 5.69 643 8.85 8.85 mozilla::JSONWriter::Scalar(mozilla::Span const&, mozilla::Span const&) > 9.55 34.17 4.93 lul::ReadSymbolDataInternal(unsigned char const*, std::__1::basic_string, std::__1::allocator > const&, std::__1::vector, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > > const&, lul::SecMap*, void*, unsigned long, lul::UniqueStringUniverse*, void (*)(char const*)) > [...] > > $ gprof /usr/local/lib/libavcodec.so inst_retired.any/libavcodec.so.*.gmon > Flat profile: > > Each sample counts as 0.00787402 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 29.63 2516.15 2516.15 341629 7.37 7.69 encode_thread > 13.33 3647.98 1131.83 498281 2.27 3.57 epzs_motion_search2 > 8.02 4329.25 681.28 decode_chunks > 6.87 4912.72 583.46 mpc8_decode_frame > [...] > > $ gprof /boot/modules/i915kms.ko inst_retired.any/i915kms.ko.gmon > Flat profile: > > Each sample counts as 0.00787402 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 13.56 1.73 1.73 eb_lookup_vmas > 11.78 3.24 1.50 57 26.38 26.38 i915_active_ref > 9.68 4.47 1.24 26 47.55 47.85 i915_request_await_object > 6.29 5.28 0.80 i915_gem_do_execbuffer > 4.62 5.87 0.59 __i915_vma_active > [...] Nice examples! Thank you for posting them. I will definitely save these for future reference. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************