Date: Sat, 16 Dec 2000 12:13:32 -0800 From: Bakul Shah <bakul@bitblocks.com> To: Marc Tardif <intmktg@CAM.ORG> Cc: freebsd-hackers@freebsd.org Subject: Re: syscall assembly Message-ID: <200012162013.PAA14008@marlborough.cnchost.com> In-Reply-To: Your message of "Fri, 15 Dec 2000 14:34:10 EST." <Pine.LNX.4.10.10012151418570.20060-100000@Gloria.CAM.ORG>
next in thread | previous in thread | raw e-mail | index | archive | help
Marc sent me this: > > > > pushl %ebp > > > > movl %esp,%ebp > > > > subl $8,%esp > > > > > > > This might not be of interest to the rest of the mailing list > > > but what is the purpose of the subl instruction used before > > > calling functions? Is that where the return value is retrieved > > > from, instead of using the %eax register as would Linux? > > > > This is to keep the stack alignment to 16 bytes. Recall that > > a call will push the return address on the stack and the > > frame pointer (%ebp) pushed so now we have 8 bytes on the > > stack. If the stack was aligned before the call, we need to > > further adjust it by 8 more bytes so that after the procedure > > prolog it is once again aligned on a 16 byte boundary. > > > [ snip ] > > Consider the following code debugged with gdb: > int func() { > return 1; > } > int main() { > return func(); > } > > # gcc -g func.c > # gdb a.out > (gdb) display/x $sp > (gdb) display/i $pc > (gdb) break *&main + 3 > (gdb) run > Breakpoint 1, 0x804848c in main () at func.c:3 > 3 } > 2: x/i $eip 0x804848f <main+3>: sub $0x8,%esp > 1: /x $esp = 0xbfbff820 > (gdb) si > 5 return func(); > 2: x/i $eip 0x8048492 <main+6>: call 0x804847c <func> > 1: /x $esp = 0xbfbff818 > > Oddly, it seems to me the stack top (pointed to by %esp) > is aligned _before_ the sub instruction. And then, this > instruction unaligns the stack by $0x8. How does this > make sense? May be people who know more about gcc will explain this better but I will speculate in any case! Assuming that 16 byte alignment actually helps, it would make sense to have either a) the local frame start at 16 byte boundary, or b) the args start at a 16 byte boundary The goal is to minimize the number of cache lines that need to be fetched. You want the *first free location* to be on a 16 byte boundary (where either the args start or the local frame starts). What Marc observed seems to point to a) -- the first free location is on a 16 byte boundary _after_ the procedure prolog (push %ebp). This is where you start allocating locals. gcc seems to put an additional restriction in that even args start at a 16 byte boundary. This seems unnecessary. It should do either a) or b) but not both. If you think of args to a called function as belonging to the caller's frame then a) is what makes sense. But if you want tail call optimization (like I do), you'd want args to be part of the callee's frame since in this case the caller's frame is *replaced* by the callee's (since you never return to the caller you can throw away his frame prior to the call but args to the callee must remain). In this case the frame pointer %ebp points in the middle of a frame but the frame starts with args. But I still question this optimization. Are there any stats on whether this 16 byte aligning improves performance? it certainly increases space use! -- bakul To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012162013.PAA14008>