Date: Wed, 13 Dec 2000 15:48:13 -0800 From: Bakul Shah <bakul@bitblocks.com> To: Iain Templeton <iain@research.canon.com.au> Cc: freebsd-hackers@FreeBSD.ORG, Alfred Perlstein <bright@wintelcom.net>, Drew Eckhardt <drew@PoohSticks.ORG>, Marc Tardif <intmktg@CAM.ORG> Subject: Re: syscall assembly Message-ID: <200012132348.SAA09244@sheffield.cnchost.com> In-Reply-To: Your message of "Thu, 14 Dec 2000 09:51:42 %2B1100." <Pine.LNX.4.10.10012140949390.22824-100000@blow.research.canon.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
> > #include <fcntl.h> > > > > int foo() { > > open("file", O_RDONLY); > > return 0; > > } > > int main() { > > int x; > > x = foo(); > > return 0; > > } > > > > results in: > > > > foo: > > pushl %ebp > > movl %esp,%ebp > > subl $8,%esp > > addl $-8,%esp > > pushl $0 > > pushl $.LC0 > > call open > > xorl %eax,%eax > > leave > > ret > > > > why the subl then addl? > > > Well, as a thoroughly rough guess, the subl is probably to create space > on the stack for the args, and the addl is to align the stack to a 16 > byte address? > > I know that the PowerPC ABI wants that, but no idea about x86. You guess about addl maintaining 16 byte alignment is right. The first subl is required to keep the initial alignment to 16 bytes due to the call to foo and saving of %ebp on the stack. Try compiling the following: extern g(); f1() { g(1,2); } f2() { g(1,2,3,4); } f3() { g(1,2,3,4,5); } f4() { g(1,2); g(1,2); } f2() does not have the addl since it has exactly 4 args. f3() has an addl $-12 to maintain 16 byte alignment (5 args take 20 bytes). f4() shows why the first subl is needed (assembly shown below). .p2align 2,0x90 .globl f4 .type f4,@function f4: pushl %ebp movl %esp,%ebp subl $8,%esp addl $-8,%esp ; g(1,2); pushl $2 pushl $1 call g addl $16,%esp addl $-8,%esp ; g(1,2); pushl $2 pushl $1 call g addl $16,%esp .L5: leave ret .Lfe4: .size f4,.Lfe4-f4 The intermediate addl $16 followed by addl$-8 is optimized when the -O flag is used but not the initial subl followed by addl. Probably because gcc treats proc prolog/epilog code specially (or it lacks a proper peephole optimizer). Note that the alignment boundary is 16 bytes, not 32 bytes as someone else claimed (see the code for f3()). But I don't see the point of this optimization -- it seems to want to put the return address on a 16 byte boundary but modern caches should be able to fetch any asked for word in a cache line first before filling in the rest of the cache.....[but PIII is not exactly modern:-) In fact by allocating 16 bytes per frame you are using up more cache lines (and more space). Its impact is worse when you compile with -fomit-frame-pointer to avoid saving/restoring the frame pointer. Now there is an unnecessary subl $12 and addl $12 on procedure entry and exit. [You don't need a framepointer *unless* you are debugging your code or doing alloca() so in well behaved code -fomit-frame-pointer savings can add up quite a bit] I guess one reason may be to make sure doubles and larger structs are aligned on 16 byte boundary but seems the cost of doing this likely outweighs the benefit. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012132348.SAA09244>