From owner-freebsd-amd64@FreeBSD.ORG  Sun Feb 22 22:18:26 2004
Return-Path: <owner-freebsd-amd64@FreeBSD.ORG>
Delivered-To: freebsd-amd64@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 76ACB16A4CE
	for <freebsd-amd64@freebsd.org>; Sun, 22 Feb 2004 22:18:26 -0800 (PST)
Received: from relay01.roc.ny.frontiernet.net (relay01.roc.ny.frontiernet.net
	[66.133.131.34])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0382743D1F
	for <freebsd-amd64@freebsd.org>;
	Sun, 22 Feb 2004 22:18:26 -0800 (PST)
	(envelope-from jlfenton@citlink.net)
Received: (qmail 19216 invoked from network); 23 Feb 2004 06:18:25 -0000
Received: from unknown (HELO citlink.net) ([67.136.108.212])
          (envelope-sender <jlfenton@citlink.net>)
          by relay01.roc.ny.frontiernet.net (FrontierMTA 2.3.6) with SMTP
          for <peter@wemm.org>; 23 Feb 2004 06:18:25 -0000
Message-ID: <40399B30.3080804@citlink.net>
Date: Sun, 22 Feb 2004 23:18:24 -0700
From: Joseph Fenton <jlfenton@citlink.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US;
	rv:1.6) Gecko/20040113
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Peter Wemm <peter@wemm.org>
References: <20040222185212.EB6BE16A4D1@hub.freebsd.org>
	<40391EC6.7010808@citlink.net> <200402222132.08092.peter@wemm.org>
In-Reply-To: <200402222132.08092.peter@wemm.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-amd64@freebsd.org
Subject: Re: CFLAGS+= -fPIC per default?
X-BeenThere: freebsd-amd64@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Porting FreeBSD to the AMD64 platform <freebsd-amd64.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
	<mailto:freebsd-amd64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-amd64>
List-Post: <mailto:freebsd-amd64@freebsd.org>
List-Help: <mailto:freebsd-amd64-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
	<mailto:freebsd-amd64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Feb 2004 06:18:26 -0000

Peter Wemm wrote:

>On Sunday 22 February 2004 01:27 pm, Joseph Fenton wrote:
>  
>
>>>>Adding CFLAGS= -fPIC to /etc/make.conf may be a local solution but
>>>>are there any drawbacks by adding something like
>>>>.if ${ARCH} == "amd64"
>>>>CFLAGS+= -fPIC
>>>>.endif
>>>>
>>>>to ports/Mk/bsd.port.mk?
>>>>        
>>>>
>>>No.. please don't.  Although the AMD64 platform supports PIC
>>>addressing modes directly, it is still a penalty.  (Although
>>>thankfully, its nowhere near as expensive as it is on i386!)
>>>
>>>For example, in libc when built in PIC mode:
>>>#ifdef PIC
>>>       movq    PIC_GOT(HIDENAME(curbrk)),%rdx
>>>       movq    (%rdx),%rax
>>>#else
>>>       movq    HIDENAME(curbrk)(%rip),%rax
>>>#endif
>>>
>>>The problem is that we can't be sure that everything will be in +/-
>>>31 bit offsets of each other.  This means that PIC objects have to
>>>do indirect memory references that aren't required in no-pic mode.
>>>
>>>I386 also loses a general purpose register (%ebx) which is why -fpic
>>>is more expensive there.  But even though we don't lose a register,
>>>its still a cost because of the extra global-offset-table memory
>>>references.
>>>
>>>Footnote: you just made me wonder about some of these ifdefs..  We
>>>shouldn't need them for intra-object references like this.  I'll
>>>have to go and look again.
>>>      
>>>
>>Sorry to be anal, but PC-relative addressing is by definition
>>position-independent code. Who was the bright individual
>>who decided that when compiling PIC code to NOT use
>>PC-relative and to NOT use PC-relative for non-PIC code?
>>    
>>
>
>Recall the last paragraph you just quoted.  I already said I thought the 
>code wasn't quite right.  However, I just remembered why its done that 
>way.
>
>Remember.. unix link semantics have interesting symbol override effects.  
>Although you might normally be jumping within the same library and can 
>trivially use %rip-relative addressing, if the main program overrides 
>libc symbols, we must use those instead.  Thus, we can't use 
>%rip-relative ways to access them because we can't be sure its going to 
>be within +/- 2GB.  In fact, its guaranteed to not be the case for 
>dynamic linking on FreeBSD/amd64 because the default load address for 
>shared libs is around the 8GB mark.  For static linking though, we 
>don't usually have this same 7.9GB hole in our symbol space.
>
>Also.. when compiling with -fpic, you don't know whether you're linking 
>pc-relative code into an application or into a shared library that 
>could be loaded just about anywhere.
>
>  
>
>>This is counter-intuitive. For PIC code, you use PC-relative
>>addressing in two cases: 1 - the code is guaranteed to be
>>a constant distance apart, like code in the same section; 2 -
>>when the loader guarantees the relative position of different
>>sections, like code and data contained in a ROM.
>>
>>Case 1 could be violated by the code being too far apart
>>for PC-relative addressing. This is virtually impossible for
>>the AMD64 as I doubt we'll see code exceeding 2G in
>>size in the next several decades. Code is only now exceeding
>>a few megabytes. Case 2 is usually your problem, which leads
>>to tables used to hold addresses or offsets.
>>    
>>
>
>Case 1 is violated by symbol overrides by the main program.
>
>  
>
>>Both sides of the #ifdef PIC are doing valid PIC code.
>>PC-relative addressing should be used wherever possible
>>unless it incurs a speed penalty.
>>    
>>
>
>gcc generally generates %rip-relative offsets where possible even 
>without -fpic.
>
>  
>
>>Non-PIC code generally does PC-relative code if it
>>is faster and is legal, for example, when referring to
>>code within the same section. When the address must
>>be set by the loader for non-PIC code, it seems to me
>>that the fastest code would be like this:
>>
>>  mov     <imm32>,%rdx
>>  movq    (%rdx),%rax
>>    
>>
>
>Guess what.. look at the original code:
>   movq    PIC_GOT(HIDENAME(curbrk)),%rdx
>   movq    (%rdx),%rax
>The first instruction just happens to be of the form 'mov <imm32>,%rdx. 
>
>  
>
>>or if the address is > 4G
>>
>>  movq    <imm64>,%rdx
>>  movq    (%rdx),%rax
>>    
>>
>
>Except that there is only one movq <imm64> instruction, and it only 
>works with %rax as a target, and its not particularly fast.  Since 
>you're guaranteed to have an offset table within +/- 2GB, you may as 
>well use it.
>
>  
>
>>The loader would then set the immediate vector upon
>>loading the sections. This avoids a memory hit for accessing
>>a table of addresses while only adding at most 5 bytes to the
>>size of the code. I would probably use this unless the user
>>is compiling with flags set to compile with minimized code
>>size.
>>    
>>
>
>Also remember that this is in libc, where its not a user code size 
>compile option.  We have to cope with whatever environment we find 
>outselves loaded into.  We have to assume the worst case scenario.
>
>Incidently, for an example of what GCC does...  given this program:
>extern int j;
>extern int foo(int i);
>int
>bar(int i)
>{
>        return foo(i) + 10 + j;
>}
>cc -S -O   produces:
>bar:
>        subq    $8, %rsp
>        call    foo
>        addl    j(%rip), %eax
>        addl    $10, %eax
>        addq    $8, %rsp
>        ret
>
>cc -S -O -fPIC produces:
>bar:
>        subq    $8, %rsp
>        call    foo@PLT
>        movq    j@GOTPCREL(%rip), %rdx
>        addl    (%rdx), %eax
>        addl    $10, %eax
>        addq    $8, %rsp
>        ret
>
>Note how the -fpic case is less efficient.  Specifically, function calls 
>are trampolined via the local object's procedure linkage table rather 
>than just calling them directly.. because we dont know if they're 
>within +/- 2GB or not.  Or if they're even in the same object.
>Secondly, it uses the global offset table to find the address of 'j' and 
>then indirectly references it as a two-step sequence.  The non-pic case 
>just makes a pc-relative reference in a single instruction. 
>
>  
>
>>Sorry to nit-pick like this, but having worked on both Mac
>>and Amiga ROMs, PIC mode under BSD really seems
>>backwards to me.
>>    
>>
>
>Unix library semantics are very very different to ROM semantics.  I've 
>been there too.
>
>Also, this isn't BSD-specific.  It's ELF specific and thats what the 
>toolchain produces and expects.  We use the same toolchain that linux 
>does.
>  
>
Okay, that made a lot more sense than the original post. Sorry about the 
whole thing.
It is rather different, but the above clears a lot of the confusion. 
Thanks a bunch!

Your example function in the two cases put it in terms I have dealt with 
on PowerMacs,
so that was a good demonstration.