Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Dec 2000 15:48:13 -0800
From:      Bakul Shah <bakul@bitblocks.com>
To:        Iain Templeton <iain@research.canon.com.au>
Cc:        freebsd-hackers@FreeBSD.ORG, Alfred Perlstein <bright@wintelcom.net>, Drew Eckhardt <drew@PoohSticks.ORG>, Marc Tardif <intmktg@CAM.ORG>
Subject:   Re: syscall assembly 
Message-ID:  <200012132348.SAA09244@sheffield.cnchost.com>
In-Reply-To: Your message of "Thu, 14 Dec 2000 09:51:42 %2B1100." <Pine.LNX.4.10.10012140949390.22824-100000@blow.research.canon.com.au> 

next in thread | previous in thread | raw e-mail | index | archive | help
> > #include <fcntl.h>
> > 
> > int foo() {
> >   open("file", O_RDONLY);
> >   return 0;
> > }
> > int main() {
> >   int x;
> >   x = foo();
> >   return 0;
> > }
> > 
> > results in:
> > 
> > foo:
> >         pushl %ebp
> >         movl %esp,%ebp
> >         subl $8,%esp
> >         addl $-8,%esp
> >         pushl $0
> >         pushl $.LC0
> >         call open
> >         xorl %eax,%eax
> >         leave
> >         ret
> > 
> > why the subl then addl?
> > 
> Well, as a thoroughly rough guess, the subl is probably to create space
> on the stack for the args, and the addl is to align the stack to a 16
> byte address?
> 
> I know that the PowerPC ABI wants that, but no idea about x86.

You guess about addl maintaining 16 byte alignment is right.
The first subl is required to keep the initial alignment to
16 bytes due to the call to foo and saving of %ebp on the
stack.  Try compiling the following:

extern g();
f1() { g(1,2); }
f2() { g(1,2,3,4); }
f3() { g(1,2,3,4,5); }
f4() { g(1,2); g(1,2); }

f2() does not have the addl since it has exactly 4 args.
f3() has an addl $-12 to maintain 16 byte alignment (5 args take 20 bytes).
f4() shows why the first subl is needed (assembly shown below).


	.p2align 2,0x90
.globl f4
	.type	 f4,@function
f4:
	pushl %ebp
	movl %esp,%ebp
	subl $8,%esp

	addl $-8,%esp	; g(1,2);
	pushl $2
	pushl $1
	call g
	addl $16,%esp

	addl $-8,%esp	; g(1,2);
	pushl $2
	pushl $1
	call g
	addl $16,%esp
.L5:
	leave
	ret
.Lfe4:
	.size	 f4,.Lfe4-f4

The intermediate addl $16 followed by addl$-8 is optimized
when the -O flag is used but not the initial subl followed by
addl.  Probably because gcc treats proc prolog/epilog code
specially (or it lacks a proper peephole optimizer).

Note that the alignment boundary is 16 bytes, not 32 bytes as
someone else claimed (see the code for f3()).

But I don't see the point of this optimization -- it seems to
want to put the return address on a 16 byte boundary but
modern caches should be able to fetch any asked for word in
a cache line first before filling in the rest of the
cache.....[but PIII is not exactly modern:-)  In fact by
allocating 16 bytes per frame you are using up more cache
lines (and more space).  Its impact is worse when you compile
with -fomit-frame-pointer to avoid saving/restoring the frame
pointer.  Now there is an unnecessary subl $12 and addl $12
on procedure entry and exit.  [You don't need a framepointer
*unless* you are debugging your code or doing alloca() so in
well behaved code -fomit-frame-pointer savings can add up
quite a bit]

I guess one reason may be to make sure doubles and larger
structs are aligned on 16 byte boundary but seems the cost of
doing this likely outweighs the benefit.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012132348.SAA09244>