Date: Sun, 23 Jan 2011 16:15:40 +0100 From: Hans Ottevanger <hansot@iae.nl> To: Roman Divacky <rdivacky@freebsd.org> Cc: freebsd-toolchain@freebsd.org Subject: Re: How to build an executable with profiling? Message-ID: <4D3C461C.6000701@iae.nl> In-Reply-To: <20110121192751.GA94113@freebsd.org> References: <20110117184411.GA54556@troutmask.apl.washington.edu> <20110118143205.GA34216@freebsd.org> <20110118160252.GA6506@troutmask.apl.washington.edu> <20110120185449.GA92860@freebsd.org> <4D39B75D.6010407@iae.nl> <20110121192751.GA94113@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 01/21/11 20:27, Roman Divacky wrote: >>> This patch does three things: >>> >>> 1) emits "call .mcount" at the begining of every function body >>> >> >> The differences on i386 between profiled and non-profiled code are not >> as obvious as with gcc (using diff on assembly output), but on first >> inspection it looks correct. > > cool :) > >>> 2) changes the driver to link in gcrt1.o instead of crt1.o >>> >>> 3) changes all -lfoo to -lfoo_p except when the foo ends with _s in >>> the linker invocation >>> >> >> Maybe it is wise to follow the gcc implementation here. > > ok, makes sense > >>> I am not sure that I did the right thing, especially in (3). Anyway, >>> the patch works for me (ie. produces a.out.gmon that seems to contain >>> meaningful data). >>> >>> I would appreciate if you guys could test and review this. Letting me >>> know if this is correct. >>> >> >> On both my systems (i386 and amd64) something goes severely wrong when >> linking several objects (all compiled with -pg, this is amd64): >> >> Perhaps the invocation of the linker still needs some work (or I must >> redo my installation) but anyhow it looks like a good job. Thanks! > > I rewrote the libraries rewriting part to match gcc as close as possible. > I also think that I solved your ld problem.. > > > please revert the old patch and test the new one: > > http://lev.vlakno.cz/~rdivacky/clang-gprof.patch > > I believe this one is ok (works for me just fine), please test and report > back so I can start integrating this upstream. > I performed a few quick tests on both i386 and amd64. The problems I had with the invocation of ld appear to be solved. The behavior with respect to libraries is now identical to gcc as far I can see. The results from gprof also look very promising. For my test program on amd64 the gprof output when using clang is % cumulative self self total time seconds seconds calls ms/call ms/call name 42.5 4.22 4.22 0 100.00% _mcount [5] 22.0 6.41 2.18 14700000 0.00 0.00 f_timint [6] 12.4 7.64 1.23 21900000 0.00 0.00 exp [10] 8.4 8.48 0.84 22000000 0.00 0.00 vmol [9] 5.4 9.02 0.54 6300000 0.00 0.00 f_angle [11] 3.8 9.40 0.38 0 100.00% .mcount (52) 1.9 9.59 0.19 1000000 0.00 0.01 qk21 [4] 1.9 9.78 0.19 1000000 0.00 0.00 pow [12] 0.4 9.82 0.04 200000 0.00 0.03 qags [3] 0.4 9.86 0.04 100000 0.00 0.00 zero [14] 0.3 9.89 0.03 100000 0.00 0.00 qext [16] 0.2 9.91 0.02 800000 0.00 0.00 f_apsis [15] 0.1 9.91 0.01 2500000 0.00 0.00 fmax [17] 0.1 9.92 0.01 100000 0.00 0.00 apsis [13] 0.0 9.92 0.00 1000000 0.00 0.00 fmin [18] 0.0 9.93 0.00 100000 0.00 0.03 timint [7] 0.0 9.93 0.00 700000 0.00 0.00 tol_apsis [19] 0.0 9.94 0.00 200000 0.00 0.00 sort [20] 0.0 9.94 0.00 1 1.85 5334.52 main [1] 0.0 9.94 0.00 100000 0.00 0.03 angle [8] ... while using gcc yields % cumulative self self total time seconds seconds calls ms/call ms/call name 44.3 4.23 4.23 0 100.00% _mcount [5] 18.5 6.00 1.76 14700000 0.00 0.00 f_timint [6] 13.5 7.28 1.28 21900000 0.00 0.00 exp [10] 9.0 8.14 0.86 22000000 0.00 0.00 vmol [9] 5.5 8.66 0.52 6300000 0.00 0.00 f_angle [11] 4.0 9.04 0.38 0 100.00% .mcount (52) 2.0 9.24 0.19 1000000 0.00 0.00 pow [12] 2.0 9.43 0.19 1000000 0.00 0.00 qk21 [4] 0.3 9.45 0.03 100000 0.00 0.00 zero [14] 0.3 9.48 0.03 200000 0.00 0.02 qags [3] 0.2 9.50 0.02 100000 0.00 0.00 qext [16] 0.2 9.52 0.02 800000 0.00 0.00 f_apsis [15] 0.1 9.53 0.00 2500000 0.00 0.00 fmax [17] 0.0 9.53 0.00 700000 0.00 0.00 tol_apsis [18] 0.0 9.53 0.00 200000 0.00 0.00 sort [19] 0.0 9.54 0.00 100000 0.00 0.00 apsis [13] 0.0 9.54 0.00 1 2.21 4927.66 main [1] 0.0 9.54 0.00 1000000 0.00 0.00 fmin [20] 0.0 9.54 0.00 100000 0.00 0.02 timint [7] 0.0 9.54 0.00 100000 0.00 0.02 angle [8] ... To me this looks quite similar 8-) I also tested the interaction of -pg with other options and there I found an issue with -fomit-frame-pointer. Here gcc bails out, as it probably should: gcc -pg -O2 -Wall -fomit-frame-pointer -c test.c gcc: -pg and -fomit-frame-pointer are incompatible while clang continues and silently generates an executable that immediately terminates with a segmentation violation when started. Another minor, unrelated issue I found is that this version of clang on i386 generates ssse2 instruction by default, while gcc and clang in -CURRENT generate the "classical" i387 instructions. Kind regards, Hans Ottevanger
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D3C461C.6000701>