Date: Wed, 19 Jan 2011 11:18:22 +0100 From: Hans Ottevanger <hansot@iae.nl> To: Roman Divacky <rdivacky@freebsd.org> Cc: freebsd-toolchain@freebsd.org, Steve Kargl <sgk@troutmask.apl.washington.edu> Subject: Re: How to build an executable with profiling? Message-ID: <4D36BA6E.5030202@iae.nl> In-Reply-To: <20110118211200.GA3586@freebsd.org> References: <20110117184411.GA54556@troutmask.apl.washington.edu> <20110118143205.GA34216@freebsd.org> <20110118144313.GO2518@deviant.kiev.zoral.com.ua> <20110118171657.GA68321@freebsd.org> <20110118173517.GA60201@troutmask.apl.washington.edu> <20110118211200.GA3586@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 01/18/11 22:12, Roman Divacky wrote: > On Tue, Jan 18, 2011 at 09:35:17AM -0800, Steve Kargl wrote: >> On Tue, Jan 18, 2011 at 06:16:57PM +0100, Roman Divacky wrote: >>> On Tue, Jan 18, 2011 at 04:43:13PM +0200, Kostik Belousov wrote: >>>> On Tue, Jan 18, 2011 at 03:32:05PM +0100, Roman Divacky wrote: >>>>> On Mon, Jan 17, 2011 at 10:44:11AM -0800, Steve Kargl wrote: >>>>>> How does one build an executable for profiling with clang? >>>>> >>>>> LLVM (and thus clang) does not support GPROF profiling. >>>>> >>>>>> clang -o testf -O2 -march=native -pipe -static -pg -I/usr/local/include -I../mp testf.c -L/usr/local/lib -L../mp -lsgk -lmpfr -lgmp -L/usr/home/kargl/work/lib -lm_clang_p >>>>>> clang: warning: the clang compiler does not support '-pg' >>>>>> If you are really desperate to find the hotspots in your program when compiled with clang, you could call clang with -v to find the call to /bin/ld. Then append _p to the appropriate libs if still needed and replace crt1.o by gcrt1.o while calling ld directly. E.g. "/usr/bin/ld" -Bstatic -o testcoll /usr/lib/gcrt1.o /usr/lib/crti.o /usr/lib/crtbegin.o testcoll.o angle.o apsis.o error.o minmax.o qags.o qext.o qk21.o sort.o timint.o zero.o vmol.o -lm_p -lgcc -lgcc_eh -lc_p -lgcc -lgcc_eh -t /usr/lib/crtend.o /usr/lib/crtn.o You will get a profile without the number of calls for the objects compiled with clang, but with the time spent. In my case: granularity: each sample hit covers 4 byte(s) for 0.00% of 6.41 seconds % cumulative self self total time seconds seconds calls ms/call ms/call name 30.3 1.94 1.94 0 100.00% f_timint [2] 20.2 3.24 1.29 0 100.00% _mcount [3] 19.4 4.48 1.24 21900000 0.00 0.00 exp [4] 13.2 5.32 0.85 0 40.51% vmol [1] 7.3 5.79 0.47 0 100.00% f_angle [5] 2.8 5.98 0.18 1000000 0.00 0.00 pow [7] 2.7 6.15 0.17 0 48.70% qk21 [6] 2.4 6.30 0.15 0 100.00% .mcount (51) 0.5 6.33 0.03 0 100.00% zero [8] 0.4 6.35 0.02 0 100.00% qext [9] 0.4 6.38 0.02 0 100.00% qags [10] ... >>>>>> I suppose it will be pointless to ask, but shouldn't clang >>>>>> support one of the most basic gcc compiler options if clang >>>>>> is to replace gcc as the base system compiler? >>>>> >>>>> is GPROF really needed at this point? we have HWPMC, isnt >>>>> it sufficient? >>>> Hwpmc requires additional work for each new CPU model. Also, >>>> hwpmc is not supported even on all Intel or AMD CPUs, esp. older >>>> models, and e.g. VIA cores. >>>> >>>> Not to mention !x86 architectures. >>> >>> yes. I agree. HWPMC is not 100% solution. >>> >>> for those interested in profiling in LLVM in detail: >>> >>> http://llvm.org/pubs/2010-04-NeustifterProfiling.html >>> >>> summary: LLVM supports inserting profiling probes (but the selection >>> of places where to put them is very naive) but there's no >>> "GPROF writer". >>> >>> I mailed the author of the thesis yesterday and it looks like his work may >>> get committed to upstream LLVM. >>> >> >> Thanks for the url and checking on the status of profiling with llvm. > > I checked the LLVM code instead and here's what I found: > > LLVM actually supports profiling, in its own format (llvmprof.out). This can > only be used for its PGO optimization (BasicBlockPlacement) and is very naive. > > Theoretically it should be possible to write "llvmprof.out -> a.out.gmon" > converter - no idea how feasible it is. I guess it would not be very easy. > > I believe it can be sufficiently easy to write a "gprof-like dumper" for > the llvmprof.out files (if there's not one already) that would print > stuff like "foo called X times, bar called Y times". I dont know about > the actual measuring of time. I think it's not in the llvmprof.out. > I have not yet completely read the reference provided, but my impression is that it describes considerably more sophistication than needed to get gprof running with clang (though the thesis looks very interesting!). All gprof needs is statistical profiling as provided by the kernel through profil(2) and addition by the compiler of a call to .mcount (and possibly allocation of a small amount of storage) on entry of each function. gcc (and pcc before it) has done this for more than 20 years, although I must admit that the code generated for the amd64 using -pg is a bit opaque to me (i386 is straightforward, though). The rest of the machinery needed is already there (in lib/libc/gmon and e.g. lib/csu/amd64/crt1.c). Kind regards, Hans Ottevanger
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D36BA6E.5030202>
