From owner-freebsd-amd64@FreeBSD.ORG Sun Feb 22 21:32:09 2004 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 917EC16A50B for ; Sun, 22 Feb 2004 21:32:09 -0800 (PST) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 86EF543D1F for ; Sun, 22 Feb 2004 21:32:09 -0800 (PST) (envelope-from peter@evilpete.dyndns.org) Received: from fw.wemm.org (canning.wemm.org [192.203.228.65]) by canning.wemm.org (Postfix) with ESMTP id 1A7132A8EC for ; Sun, 22 Feb 2004 21:32:09 -0800 (PST) (envelope-from peter@overcee.wemm.org) Received: from overcee.wemm.org (unknown [10.0.0.3]) by fw.wemm.org (Postfix) with ESMTP id 94CAD2C1AF for ; Sun, 22 Feb 2004 21:32:08 -0800 (PST) (envelope-from peter@overcee.wemm.org) Received: from overcee.wemm.org (localhost [127.0.0.1]) by overcee.wemm.org (8.12.11/8.12.10) with ESMTP id i1N5W8Q3036093; Sun, 22 Feb 2004 21:32:08 -0800 (PST) (envelope-from peter@overcee.wemm.org) Received: from localhost (localhost [[UNIX: localhost]]) by overcee.wemm.org (8.12.11/8.12.10/Submit) id i1N5W83R036092; Sun, 22 Feb 2004 21:32:08 -0800 (PST) (envelope-from peter) From: Peter Wemm To: freebsd-amd64@freebsd.org Date: Sun, 22 Feb 2004 21:32:08 -0800 User-Agent: KMail/1.6 References: <20040222185212.EB6BE16A4D1@hub.freebsd.org> <40391EC6.7010808@citlink.net> In-Reply-To: <40391EC6.7010808@citlink.net> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200402222132.08092.peter@wemm.org> cc: Joseph Fenton Subject: Re: CFLAGS+= -fPIC per default? X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2004 05:32:09 -0000 On Sunday 22 February 2004 01:27 pm, Joseph Fenton wrote: > >>Adding CFLAGS= -fPIC to /etc/make.conf may be a local solution but > >>are there any drawbacks by adding something like > >>.if ${ARCH} == "amd64" > >>CFLAGS+= -fPIC > >>.endif > >> > >>to ports/Mk/bsd.port.mk? > > > >No.. please don't. Although the AMD64 platform supports PIC > > addressing modes directly, it is still a penalty. (Although > > thankfully, its nowhere near as expensive as it is on i386!) > > > >For example, in libc when built in PIC mode: > >#ifdef PIC > > movq PIC_GOT(HIDENAME(curbrk)),%rdx > > movq (%rdx),%rax > >#else > > movq HIDENAME(curbrk)(%rip),%rax > >#endif > > > >The problem is that we can't be sure that everything will be in +/- > > 31 bit offsets of each other. This means that PIC objects have to > > do indirect memory references that aren't required in no-pic mode. > > > >I386 also loses a general purpose register (%ebx) which is why -fpic > > is more expensive there. But even though we don't lose a register, > > its still a cost because of the extra global-offset-table memory > > references. > > > >Footnote: you just made me wonder about some of these ifdefs.. We > >shouldn't need them for intra-object references like this. I'll > > have to go and look again. > > Sorry to be anal, but PC-relative addressing is by definition > position-independent code. Who was the bright individual > who decided that when compiling PIC code to NOT use > PC-relative and to NOT use PC-relative for non-PIC code? Recall the last paragraph you just quoted. I already said I thought the code wasn't quite right. However, I just remembered why its done that way. Remember.. unix link semantics have interesting symbol override effects. Although you might normally be jumping within the same library and can trivially use %rip-relative addressing, if the main program overrides libc symbols, we must use those instead. Thus, we can't use %rip-relative ways to access them because we can't be sure its going to be within +/- 2GB. In fact, its guaranteed to not be the case for dynamic linking on FreeBSD/amd64 because the default load address for shared libs is around the 8GB mark. For static linking though, we don't usually have this same 7.9GB hole in our symbol space. Also.. when compiling with -fpic, you don't know whether you're linking pc-relative code into an application or into a shared library that could be loaded just about anywhere. > This is counter-intuitive. For PIC code, you use PC-relative > addressing in two cases: 1 - the code is guaranteed to be > a constant distance apart, like code in the same section; 2 - > when the loader guarantees the relative position of different > sections, like code and data contained in a ROM. > > Case 1 could be violated by the code being too far apart > for PC-relative addressing. This is virtually impossible for > the AMD64 as I doubt we'll see code exceeding 2G in > size in the next several decades. Code is only now exceeding > a few megabytes. Case 2 is usually your problem, which leads > to tables used to hold addresses or offsets. Case 1 is violated by symbol overrides by the main program. > Both sides of the #ifdef PIC are doing valid PIC code. > PC-relative addressing should be used wherever possible > unless it incurs a speed penalty. gcc generally generates %rip-relative offsets where possible even without -fpic. > Non-PIC code generally does PC-relative code if it > is faster and is legal, for example, when referring to > code within the same section. When the address must > be set by the loader for non-PIC code, it seems to me > that the fastest code would be like this: > > mov ,%rdx > movq (%rdx),%rax Guess what.. look at the original code: movq PIC_GOT(HIDENAME(curbrk)),%rdx movq (%rdx),%rax The first instruction just happens to be of the form 'mov ,%rdx. > or if the address is > 4G > > movq ,%rdx > movq (%rdx),%rax Except that there is only one movq instruction, and it only works with %rax as a target, and its not particularly fast. Since you're guaranteed to have an offset table within +/- 2GB, you may as well use it. > The loader would then set the immediate vector upon > loading the sections. This avoids a memory hit for accessing > a table of addresses while only adding at most 5 bytes to the > size of the code. I would probably use this unless the user > is compiling with flags set to compile with minimized code > size. Also remember that this is in libc, where its not a user code size compile option. We have to cope with whatever environment we find outselves loaded into. We have to assume the worst case scenario. Incidently, for an example of what GCC does... given this program: extern int j; extern int foo(int i); int bar(int i) { return foo(i) + 10 + j; } cc -S -O produces: bar: subq $8, %rsp call foo addl j(%rip), %eax addl $10, %eax addq $8, %rsp ret cc -S -O -fPIC produces: bar: subq $8, %rsp call foo@PLT movq j@GOTPCREL(%rip), %rdx addl (%rdx), %eax addl $10, %eax addq $8, %rsp ret Note how the -fpic case is less efficient. Specifically, function calls are trampolined via the local object's procedure linkage table rather than just calling them directly.. because we dont know if they're within +/- 2GB or not. Or if they're even in the same object. Secondly, it uses the global offset table to find the address of 'j' and then indirectly references it as a two-step sequence. The non-pic case just makes a pc-relative reference in a single instruction. > Sorry to nit-pick like this, but having worked on both Mac > and Amiga ROMs, PIC mode under BSD really seems > backwards to me. Unix library semantics are very very different to ROM semantics. I've been there too. Also, this isn't BSD-specific. It's ELF specific and thats what the toolchain produces and expects. We use the same toolchain that linux does. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5