From owner-freebsd-toolchain@FreeBSD.ORG Mon Jan 24 16:46:55 2011 Return-Path: Delivered-To: freebsd-toolchain@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B54A106564A for ; Mon, 24 Jan 2011 16:46:55 +0000 (UTC) (envelope-from rdivacky@vlk.vlakno.cz) Received: from vlakno.cz (lev.vlakno.cz [77.93.215.190]) by mx1.freebsd.org (Postfix) with ESMTP id 137B58FC08 for ; Mon, 24 Jan 2011 16:46:54 +0000 (UTC) Received: from lev.vlakno.cz (localhost [127.0.0.1]) by vlakno.cz (Postfix) with ESMTP id EFA559CB0C3; Mon, 24 Jan 2011 17:46:52 +0100 (CET) X-Virus-Scanned: amavisd-new at vlakno.cz Received: from vlakno.cz ([127.0.0.1]) by lev.vlakno.cz (lev.vlakno.cz [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D5knl7ZkHtAO; Mon, 24 Jan 2011 17:46:52 +0100 (CET) Received: from vlk.vlakno.cz (localhost [127.0.0.1]) by vlakno.cz (Postfix) with ESMTP id 0FC429CB11D; Mon, 24 Jan 2011 17:46:52 +0100 (CET) Received: (from rdivacky@localhost) by vlk.vlakno.cz (8.14.4/8.14.4/Submit) id p0OGkpFW009126; Mon, 24 Jan 2011 17:46:51 +0100 (CET) (envelope-from rdivacky) Date: Mon, 24 Jan 2011 17:46:51 +0100 From: Roman Divacky To: Hans Ottevanger Message-ID: <20110124164651.GA8672@freebsd.org> References: <20110117184411.GA54556@troutmask.apl.washington.edu> <20110118143205.GA34216@freebsd.org> <20110118160252.GA6506@troutmask.apl.washington.edu> <20110120185449.GA92860@freebsd.org> <4D39B75D.6010407@iae.nl> <20110121192751.GA94113@freebsd.org> <4D3C461C.6000701@iae.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D3C461C.6000701@iae.nl> User-Agent: Mutt/1.4.2.3i Cc: freebsd-toolchain@freebsd.org Subject: Re: How to build an executable with profiling? X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jan 2011 16:46:55 -0000 On Sun, Jan 23, 2011 at 04:15:40PM +0100, Hans Ottevanger wrote: > On 01/21/11 20:27, Roman Divacky wrote: > >>>This patch does three things: > >>> > >>>1) emits "call .mcount" at the begining of every function body > >>> > >> > >>The differences on i386 between profiled and non-profiled code are not > >>as obvious as with gcc (using diff on assembly output), but on first > >>inspection it looks correct. > > > >cool :) > > > >>>2) changes the driver to link in gcrt1.o instead of crt1.o > >>> > >>>3) changes all -lfoo to -lfoo_p except when the foo ends with _s in > >>> the linker invocation > >>> > >> > >>Maybe it is wise to follow the gcc implementation here. > > > >ok, makes sense > > > >>>I am not sure that I did the right thing, especially in (3). Anyway, > >>>the patch works for me (ie. produces a.out.gmon that seems to contain > >>>meaningful data). > >>> > >>>I would appreciate if you guys could test and review this. Letting me > >>>know if this is correct. > >>> > >> > >>On both my systems (i386 and amd64) something goes severely wrong when > >>linking several objects (all compiled with -pg, this is amd64): > >> > >>Perhaps the invocation of the linker still needs some work (or I must > >>redo my installation) but anyhow it looks like a good job. Thanks! > > > >I rewrote the libraries rewriting part to match gcc as close as possible. > >I also think that I solved your ld problem.. > > > > > >please revert the old patch and test the new one: > > > > http://lev.vlakno.cz/~rdivacky/clang-gprof.patch > > > >I believe this one is ok (works for me just fine), please test and report > >back so I can start integrating this upstream. > > > > I performed a few quick tests on both i386 and amd64. > > The problems I had with the invocation of ld appear to be solved. The > behavior with respect to libraries is now identical to gcc as far I can see. > > The results from gprof also look very promising. For my test program on > amd64 the gprof output when using clang is > > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 42.5 4.22 4.22 0 100.00% _mcount [5] > 22.0 6.41 2.18 14700000 0.00 0.00 f_timint [6] > 12.4 7.64 1.23 21900000 0.00 0.00 exp [10] > 8.4 8.48 0.84 22000000 0.00 0.00 vmol [9] > 5.4 9.02 0.54 6300000 0.00 0.00 f_angle [11] > 3.8 9.40 0.38 0 100.00% .mcount (52) > 1.9 9.59 0.19 1000000 0.00 0.01 qk21 [4] > 1.9 9.78 0.19 1000000 0.00 0.00 pow [12] > 0.4 9.82 0.04 200000 0.00 0.03 qags [3] > 0.4 9.86 0.04 100000 0.00 0.00 zero [14] > 0.3 9.89 0.03 100000 0.00 0.00 qext [16] > 0.2 9.91 0.02 800000 0.00 0.00 f_apsis [15] > 0.1 9.91 0.01 2500000 0.00 0.00 fmax [17] > 0.1 9.92 0.01 100000 0.00 0.00 apsis [13] > 0.0 9.92 0.00 1000000 0.00 0.00 fmin [18] > 0.0 9.93 0.00 100000 0.00 0.03 timint [7] > 0.0 9.93 0.00 700000 0.00 0.00 tol_apsis [19] > 0.0 9.94 0.00 200000 0.00 0.00 sort [20] > 0.0 9.94 0.00 1 1.85 5334.52 main [1] > 0.0 9.94 0.00 100000 0.00 0.03 angle [8] > ... > > while using gcc yields > > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 44.3 4.23 4.23 0 100.00% _mcount [5] > 18.5 6.00 1.76 14700000 0.00 0.00 f_timint [6] > 13.5 7.28 1.28 21900000 0.00 0.00 exp [10] > 9.0 8.14 0.86 22000000 0.00 0.00 vmol [9] > 5.5 8.66 0.52 6300000 0.00 0.00 f_angle [11] > 4.0 9.04 0.38 0 100.00% .mcount (52) > 2.0 9.24 0.19 1000000 0.00 0.00 pow [12] > 2.0 9.43 0.19 1000000 0.00 0.00 qk21 [4] > 0.3 9.45 0.03 100000 0.00 0.00 zero [14] > 0.3 9.48 0.03 200000 0.00 0.02 qags [3] > 0.2 9.50 0.02 100000 0.00 0.00 qext [16] > 0.2 9.52 0.02 800000 0.00 0.00 f_apsis [15] > 0.1 9.53 0.00 2500000 0.00 0.00 fmax [17] > 0.0 9.53 0.00 700000 0.00 0.00 tol_apsis [18] > 0.0 9.53 0.00 200000 0.00 0.00 sort [19] > 0.0 9.54 0.00 100000 0.00 0.00 apsis [13] > 0.0 9.54 0.00 1 2.21 4927.66 main [1] > 0.0 9.54 0.00 1000000 0.00 0.00 fmin [20] > 0.0 9.54 0.00 100000 0.00 0.02 timint [7] > 0.0 9.54 0.00 100000 0.00 0.02 angle [8] > ... > > To me this looks quite similar 8-) awesome! :) > I also tested the interaction of -pg with other options and there I > found an issue with -fomit-frame-pointer. Here gcc bails out, as it > probably should: > > gcc -pg -O2 -Wall -fomit-frame-pointer -c test.c > gcc: -pg and -fomit-frame-pointer are incompatible > > while clang continues and silently generates an executable that > immediately terminates with a segmentation violation when started. will fix today > Another minor, unrelated issue I found is that this version of clang on > i386 generates ssse2 instruction by default, while gcc and clang in > -CURRENT generate the "classical" i387 instructions. we default to i486 in -CURRENT while upstream defaults to pentium4, so this is expected. thank you for your great testing and help! I am gonna push it upstream now so we'll get it with next clang/llvm update in -current roman