From owner-freebsd-amd64@FreeBSD.ORG Sun Feb 22 22:18:26 2004 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 76ACB16A4CE for ; Sun, 22 Feb 2004 22:18:26 -0800 (PST) Received: from relay01.roc.ny.frontiernet.net (relay01.roc.ny.frontiernet.net [66.133.131.34]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0382743D1F for ; Sun, 22 Feb 2004 22:18:26 -0800 (PST) (envelope-from jlfenton@citlink.net) Received: (qmail 19216 invoked from network); 23 Feb 2004 06:18:25 -0000 Received: from unknown (HELO citlink.net) ([67.136.108.212]) (envelope-sender ) by relay01.roc.ny.frontiernet.net (FrontierMTA 2.3.6) with SMTP for ; 23 Feb 2004 06:18:25 -0000 Message-ID: <40399B30.3080804@citlink.net> Date: Sun, 22 Feb 2004 23:18:24 -0700 From: Joseph Fenton User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Peter Wemm References: <20040222185212.EB6BE16A4D1@hub.freebsd.org> <40391EC6.7010808@citlink.net> <200402222132.08092.peter@wemm.org> In-Reply-To: <200402222132.08092.peter@wemm.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-amd64@freebsd.org Subject: Re: CFLAGS+= -fPIC per default? X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2004 06:18:26 -0000 Peter Wemm wrote: >On Sunday 22 February 2004 01:27 pm, Joseph Fenton wrote: > > >>>>Adding CFLAGS= -fPIC to /etc/make.conf may be a local solution but >>>>are there any drawbacks by adding something like >>>>.if ${ARCH} == "amd64" >>>>CFLAGS+= -fPIC >>>>.endif >>>> >>>>to ports/Mk/bsd.port.mk? >>>> >>>> >>>No.. please don't. Although the AMD64 platform supports PIC >>>addressing modes directly, it is still a penalty. (Although >>>thankfully, its nowhere near as expensive as it is on i386!) >>> >>>For example, in libc when built in PIC mode: >>>#ifdef PIC >>> movq PIC_GOT(HIDENAME(curbrk)),%rdx >>> movq (%rdx),%rax >>>#else >>> movq HIDENAME(curbrk)(%rip),%rax >>>#endif >>> >>>The problem is that we can't be sure that everything will be in +/- >>>31 bit offsets of each other. This means that PIC objects have to >>>do indirect memory references that aren't required in no-pic mode. >>> >>>I386 also loses a general purpose register (%ebx) which is why -fpic >>>is more expensive there. But even though we don't lose a register, >>>its still a cost because of the extra global-offset-table memory >>>references. >>> >>>Footnote: you just made me wonder about some of these ifdefs.. We >>>shouldn't need them for intra-object references like this. I'll >>>have to go and look again. >>> >>> >>Sorry to be anal, but PC-relative addressing is by definition >>position-independent code. Who was the bright individual >>who decided that when compiling PIC code to NOT use >>PC-relative and to NOT use PC-relative for non-PIC code? >> >> > >Recall the last paragraph you just quoted. I already said I thought the >code wasn't quite right. However, I just remembered why its done that >way. > >Remember.. unix link semantics have interesting symbol override effects. >Although you might normally be jumping within the same library and can >trivially use %rip-relative addressing, if the main program overrides >libc symbols, we must use those instead. Thus, we can't use >%rip-relative ways to access them because we can't be sure its going to >be within +/- 2GB. In fact, its guaranteed to not be the case for >dynamic linking on FreeBSD/amd64 because the default load address for >shared libs is around the 8GB mark. For static linking though, we >don't usually have this same 7.9GB hole in our symbol space. > >Also.. when compiling with -fpic, you don't know whether you're linking >pc-relative code into an application or into a shared library that >could be loaded just about anywhere. > > > >>This is counter-intuitive. For PIC code, you use PC-relative >>addressing in two cases: 1 - the code is guaranteed to be >>a constant distance apart, like code in the same section; 2 - >>when the loader guarantees the relative position of different >>sections, like code and data contained in a ROM. >> >>Case 1 could be violated by the code being too far apart >>for PC-relative addressing. This is virtually impossible for >>the AMD64 as I doubt we'll see code exceeding 2G in >>size in the next several decades. Code is only now exceeding >>a few megabytes. Case 2 is usually your problem, which leads >>to tables used to hold addresses or offsets. >> >> > >Case 1 is violated by symbol overrides by the main program. > > > >>Both sides of the #ifdef PIC are doing valid PIC code. >>PC-relative addressing should be used wherever possible >>unless it incurs a speed penalty. >> >> > >gcc generally generates %rip-relative offsets where possible even >without -fpic. > > > >>Non-PIC code generally does PC-relative code if it >>is faster and is legal, for example, when referring to >>code within the same section. When the address must >>be set by the loader for non-PIC code, it seems to me >>that the fastest code would be like this: >> >> mov ,%rdx >> movq (%rdx),%rax >> >> > >Guess what.. look at the original code: > movq PIC_GOT(HIDENAME(curbrk)),%rdx > movq (%rdx),%rax >The first instruction just happens to be of the form 'mov ,%rdx. > > > >>or if the address is > 4G >> >> movq ,%rdx >> movq (%rdx),%rax >> >> > >Except that there is only one movq instruction, and it only >works with %rax as a target, and its not particularly fast. Since >you're guaranteed to have an offset table within +/- 2GB, you may as >well use it. > > > >>The loader would then set the immediate vector upon >>loading the sections. This avoids a memory hit for accessing >>a table of addresses while only adding at most 5 bytes to the >>size of the code. I would probably use this unless the user >>is compiling with flags set to compile with minimized code >>size. >> >> > >Also remember that this is in libc, where its not a user code size >compile option. We have to cope with whatever environment we find >outselves loaded into. We have to assume the worst case scenario. > >Incidently, for an example of what GCC does... given this program: >extern int j; >extern int foo(int i); >int >bar(int i) >{ > return foo(i) + 10 + j; >} >cc -S -O produces: >bar: > subq $8, %rsp > call foo > addl j(%rip), %eax > addl $10, %eax > addq $8, %rsp > ret > >cc -S -O -fPIC produces: >bar: > subq $8, %rsp > call foo@PLT > movq j@GOTPCREL(%rip), %rdx > addl (%rdx), %eax > addl $10, %eax > addq $8, %rsp > ret > >Note how the -fpic case is less efficient. Specifically, function calls >are trampolined via the local object's procedure linkage table rather >than just calling them directly.. because we dont know if they're >within +/- 2GB or not. Or if they're even in the same object. >Secondly, it uses the global offset table to find the address of 'j' and >then indirectly references it as a two-step sequence. The non-pic case >just makes a pc-relative reference in a single instruction. > > > >>Sorry to nit-pick like this, but having worked on both Mac >>and Amiga ROMs, PIC mode under BSD really seems >>backwards to me. >> >> > >Unix library semantics are very very different to ROM semantics. I've >been there too. > >Also, this isn't BSD-specific. It's ELF specific and thats what the >toolchain produces and expects. We use the same toolchain that linux >does. > > Okay, that made a lot more sense than the original post. Sorry about the whole thing. It is rather different, but the above clears a lot of the confusion. Thanks a bunch! Your example function in the two cases put it in terms I have dealt with on PowerMacs, so that was a good demonstration.